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Abstract 

We  develop  a  new  approach  to  solving  minimum-cost  circulation  problems.  Our 
approach  combines  methods  for  solving  the  maximum  flow  problem  with  successive 
approximation  techniques  based  on  cost  scaling.  We  measure  the  accuracy  of  a  solution 
by  the  amount  that  the  complementary  slackness  conditions  are  violated. 

We  propose  a  simple  minimum-cost  circulation  algorithm,  one  version  of  which  runs 
in  0(n3  log(nC))  time  on  an  n- vertex  network  with  integer  arc  costs  of  absolute  value  at 
most  C.  By  incorporating  sophisticated  data  structures  into  the  algorithm,  we  obtain 
a  time  bound  of  0(nm  log(n2/m)  log( nC))  on  a  network  with  m  arcs.  A  slightly  dif¬ 
ferent  use  of  our  approach  shows  that  a  minimum-cost  circulation  can  be  computed  by 
solving  a  sequence  of  0(n  log(nC))  blocking  flow  problems.  A  corollary  of  this  result  is 
an  0(n2(log  n)  log(nC))-time,  n-processor  parallel  minimum-cost  circulation  algorithm. 
Our  approach  also  yields  strongly  polynomial  minimum-cost  circulation  algorithms. 

Our  results  provide  evidence  that  the  minimum-cost  circulation  problem  is  not  much 
harder  than  the  maximum  flow  problem.  We  believe  that  a  suitable  implementation  of 
our  method  will  perform  extremely  well  in  practice. 


1  Introduction 

The  minimum-cost  circulation  problem  is  that  of  finding  a  feasible  circulation  of  minimum 
cost  in  a  network  whose  arcs  have  flow  capacities  and  costs  per  unit  of  flow.  Extensive 
discussions  of  this  problem  and  its  applications  can  be  found  in  the  books  of  Ford  and 
Fulkerson  [13],  Jensen  and  Barnes  [32],  Lawler  [36],  Papadimitriou  and  Steiglitz  [44],  and 
Tarjan  [52]. 

Classical  algorithms  for  the  problem,  such  as  the  out-of-kilter  method  [17,41]  and  the 
cheapest  path  augmentation  method  [8,33],  run  in  exponential  time  in  the  worst  case.  All 
known  polynomial-time  algorithms  for  the  problem  rely  on  the  concept  of  scaling.  Scaling 
was  introduced  by  Edmonds  and  Karp  [12],  who  devised  the  first  polynomial-time  algorithm 
for  the  problem.  Scaling  has  also  been  applied  to  various  other  network  optimization  prob-  /^otic  x 
lems  [18,19].  Scaling  algorithms  work  by  solving  a  sequence  of  subproblems  whose  numeric  (  copy 
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parameters  more-and-more  closely  approximate  those  of  the  original  problem.  A  solution  Vi// 
to  one  subproblem  helps  to  solve  the  next  subproblem  in  the  sequence.  In  the  case  of  the 
minimum-cost  circulation  problem,  successive  subproblems  are  obtained  by  adding  one  bit  on  por 
of  precision  at  a  time  either  to  the  capacities  ( capacity  scaling )  or  to  the  costs  ( cost  scaling).  Ikkl  ^ 

3  n 

The  algorithm  of  Edmonds  and  Karp  scales  capacities.  Rock  [45]  was  the  first  to  propose  iC0(j  q 

an  algorithm  that  scales  costs.  :atlon - - 

Table  1  summarizes  known  polynomial-time  algorithms  for  the  minimum-cost  circulation 

problem,  including  those  proposed  in  this  paper.  Time  bounds  are  given  in  terms  of  the  it  I  on/ _ 

following  parameters  and  functions:  a.  j  nubility  Codes 
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0(nm  log  (n2/m)  min{log(nC),  m  log  n}) 
0(nB(n,  m)  min{log(nC),  mlogn}) 
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Table  1:  Polynomial-time  algorithms  for  the  minimum-cost  circulation  problem.  Algorithms  9 

and  10  are  presented  in  this  paper. 


n,  the  number  of  vertices; 
m,  the  number  of  arcs; 

U,  the  maximum  absolute  value  of  an  arc  capacity; 

C,  the  maximum  absolute  value  of  an  arc  cost; 

S(n,  m),  the  time  to  solve  a  single-source  shortest  path  problem  with  nonnegative  arc  costs, 
as  a  function  of  n  and  m; 

F(n,  m,  U),  the  time  to  find  a  maximum  flow,  as  a  function  of  n,  m ,  and  U ; 

B(n,m),  the  time  to  find  a  blocking  flow  on  an  acyclic  network,  as  a  function  of  n  and  m. 

In  bounds  containing  V  or  C,  the  capacities  or  costs,  respectively,  are  assumed  to  be 
integers.  The  best  known  bounds  for  S,  F,  and  B  are  as  follows: 

S(n,m)  =0(nlogn  +  ro)  [15] 

F(n,m,U)  =  0(nmlog(n2/m))  [29] 

=  0(n2  log  If  +  nm)  [1] 

B(n,m)  =  0(mlog(n2/m))  (this  paper,  Section  8.1). 


The  algorithms  in  the  table  split  into  two  classes,  those  that  use  capacity  scaling  (al¬ 
gorithms  1,  2,  5,  6,  and  8)  and  those  that  use  cost  scaling  (algorithms  3,  4,  7,  9,  and  10). 
All  the  capacity  scaling  algorithms  require  a  shortest  path  subroutine;  all  the  cost  scaling 
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algorithms  require  a  maximum  flow  or  related  subroutine.  Algorithms  4,  5,  6,  8,  9,  and 
10  are  strongly  polynomial:  their  running  times  are  polynomial  in  n  and  m  assuming  that 
arithmetic  operations  take  0(1)  time,  and  the  numbers  they  manipulate  have  a  number 
of  bits  polynomial  in  n,  m,  log  17,  and  logC.  The  book  of  Papadimitriou  and  Steiglitz  [44] 
and  the  paper  of  Tardos  [50]  contain  discussions  of  the  notion  of  a  strongly  polynomial 
algorithm. 

It  is  difficult  to  compare  the  relative  speeds  of  the  various  algorithms,  because  their  time 
bounds  depend  in  different  ways  upon  n,  m,  U,  and  C.  Among  the  algorithms  not  based 
on  our  approach  (1-8),  the  dominant  algorithms  are  1  and  2,  3  and  7,  and  8.  Under  the 
assumption  (see  [19])  that  log  U  and  logC  are  6(logn),  algorithms  1  and  2  have  a  bound 
of  0(m2  log  n  +  nm(logn)2),  whereas  algorithms  3,  7,  and  8  have  a  bound  of  0(n2m  log  n  + 
n3(logn)2),  larger  by  a  factor  of  n2/m. 

In  this  paper  we  discuss  a  general  approach  to  the  minimum-cost  circulation  problem. 
The  approach  uses  successive  approximation  based  on  cost  scaling.  It  combines  the  ideas 
Rock  [45]  used  to  develop  algorithm  3,  those  Tardos  [50]  used  to  develop  algorithm  4,  those 
Bland  and  Jensen  [7]  used  to  develop  algorithm  7,  and  those  Bertsekas  [4]  used  to  develop 
a  pseudopolynomial-time  (i.e.  polynomial  in  »,  m,  and  C)  algorithm  for  the  problem.  The 
method  works  by  finding  an  approximate  solution  and  then  iteratively  refining  the  current 
solution,  at  each  iteration  halving  the  error  parameter  e,  until  e  is  small  enough  so  that 
integrality  of  axe  costs  guarantees  that  the  current  solution  is  optimal.  To  measure  the 
accuracy  of  a  solution,  we  use  the  concept  of  e-optimality,  which  is  based  on  a  relaxation 
of  complementary  slackness  conditions.  This  concept  is  related  to  the  classical  technique  of 
perturbing  a  linear  programming  problem  to  avoid  degeneracy  (see  for  example  the  book  of 
Gass  [25]).  Tardos  [50]  and  Bertsekas  [4]  both  used  e-optimality  (in  different  ways)  in  their 
algorithms.  The  algorithm  of  Tardos  fixes  the  flow  on  edges  with  sufficiently  high  reduced 
cost.  The  algorithm  of  Bertsekas  finds  a  minimum-cost  circulation  by  locally  modifying  a 
pseudoflow  while  preserving  e-optimality.  Using  a  lemma  of  Bertsekas  and  a  generalization 
of  a  lemma  of  Tardos,  we  show  that  0(min{log(nC),  mlogn})  iterations  of  a  refinement 
subroutine  suffice  to  compute  a  minimum-cost  circulation. 

We  discuss  two  general  ways  to  implement  the  refinement  subroutine  that  is  the  heart 
of  the  algorithm.  Both  are  generalizations  of  maximum  flow  algorithms.  The  first  method, 
which  we  call  the  generic  method ,  is  a  slight  variation  of  the  algorithm  of  Bertsekas  [4] 
that  generalizes  our  maximum  flow  algorithm  [29].  The  generic  method  uses  a  sequence 
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of  transformations  to  produce  an  ^-optimal  circulation.  Bertsekas  made  the  important 
observation  that  these  transformations  preserve  c-optimality.  The  analysis  of  the  generic 
method  uses  ideas  similar  to  those  in  the  analysis  of  the  maximum  flow  algorithm  [29]. 
We  describe  a  simple  version  of  the  generic  method  with  a  running  time  of  0(n3),  giving 
an  0(n3  min{log(  nC),  m log n})  bound  for  the  minimum-cost  circulation  problem.  We  call 
this  algorithm  the  wave  method ,  because  it  resembles  the  wave  algorithm  for  computing  a 
maximum  flow  [53].  The  wave  method  was  proposed  by  Charles  Leiserson  (private  commu¬ 
nication)  and  independently  (minus  certain  implementation  details  and  the  use  of  scaling) 
by  Bertsekas  [4].  A  more  complicated  algorithm  using  two  sophisticated  data  structures,  dy¬ 
namic  trees  [48,49]  and  Anger  search  trees  [39,54],  yields  an  O(nmlog(n2/m))-time  bound 
for  refinement  and  an  0(nmlog(n2/m)min{log(nC),mlogn})-time  bound  for  finding  a 
minimum-cost  circulation. 

Our  second  implementation  of  refinement,  the  blocking  flow  method ,  uses  O(n)  calls  of 
a  blocking  flow  subroutine.  It  is  a  generalization  of  the  maximum  flow  approach  of  Dinic 
[II].  The  blocking  flow  approach  also  yields  a  simple  0(n3)-time  refinement  algorithm, 
based  on  known  0(n2)-time  blocking  flow  algorithms  [35,38,46,53],  and  a  more  complicated 
O(nmlog(n2/m))-time  refinement  algorithm,  based  on  a  new  blocking  flow  algorithm  (pre¬ 
sented  in  Section  8).  The  blocking  flow  method  also  yields  an  0(n2  logn)-time,  n-processor 
parallel  refinement  algorithm,  based  on  the  parallel  blocking  flow  algorithm  of  Shiloach  and 
Vishkin  [46].  This  gives  an  0(n2  log  n  min{log( nC),m  log  n})-time,  n-processor  algorithm 
for  the  minimum-cost  circulation  problem. 

If  C  =  0(nn'  #)  and  C  —  O(tfa(m/(nlo«n)+1)/n)  for  some  positive  constants  6  and  a,  then 
our  0(nmlog(n2/m)min{log(nC),  mlogn})  sequential  time  bound  for  finding  a  minimum- 
cost  circulation  is  as  small  as  the  bound  for  any  other  method.  Under  the  assumption  that 
log  U  and  logC  are  0(logn),  our  bound  significantly  improves  over  previous  bounds. 

This  paper  is  a  revised  and  extended  version  of  one  chapter  of  the  first  author’s  thesis  [26] 
and  of  a  conference  paper  [30].  Working  from  an  extended  abstract  of  the  conference  paper, 
Bertsekas  and  Eckstein  [5]  developed  an  0(n3  log( nC))  minimum-cost  circulation  algorithm 
using  a  more  traditional  cost-scaling  approach  in  combination  with  the  wave  method.  This 
algorithm  is  a  minor  variation  of  a  version  of  algorithm  9  and  therefore  is  not  mentioned 
in  Table  1.  The  bounds  we  derive  in  this  paper  can  also  be  derived  using  traditional  cost 
scaling;  in  fact,  the  first  versions  of  our  algorithm  used  this  approach.  The  successive 

t 


4 


approximation  approach,  however,  seems  preferable  for  at  least  two  reasons.  First,  it  allows 
the  use  of  true  costs  throughout  the  algorithm;  no  rounding  is  needed.  Second,  it  leads  very 
naturally  to  strongly  polynomial  algorithms. 

2  Foundations 

In  this  section  we  define  the  minimum-cost  circulation  problem  and  introduce  terminology 
to  be  used  throughout  the  paper.  We  also  review  conditions  for  a  circulation  to  be  optimal, 
i.e.  minimum-cost,  and  introduce  the  notion  of  e-optimality,  which  plays  a  central  role  in 
our  approach. 

The  minimum-cost  circulation  problem  is  a  generalization  of  the  maximum  flow  problem 
obtained  by  adding  a  cost  per  unit  of  flow  to  each  arc.  Since  this  problem  is  a  special  case 
of  the  linear  programming  problem,  it  is  usually  defined  in  linear  programming  terms.  Al¬ 
though  we  use  concepts  and  results  that  have  their  roots  in  linear  programming  theory,  most 
of  the  arguments  presented  in  this  paper  are  graph-theoretic.  Consequently,  we  formulate 
the  problem  in  graph-theoretic  terms.  Our  formulation  is  equivalent  to  other  formulations 
of  the  minimum-cost  flow  and  minimum-cost  circulation  problems  that  can  be  found  in 
standard  works  on  linear  programming  and  network  optimization,  including  those  cited  in 
Section  1. 

2.1  Definitions 

A  circulation  network  is  a  directed  graph  G  =  {V,  E)  with  a  capacity  u(v,w)  and  a  cost 
c(v,w)  associated  with  each  arc1  (v,  w).  We  require  G  to  be  symmetric,  i.e.  (v,w)  £  E 
implies  (u>,  v)  £  E.  We  denote  the  size  of  V  by  n  and  the  size  of  E  by  m,  and  we  assume 
(for  ease  in  stating  time  bounds)  that  m  >  n  -  1  >  1.  We  call  an  unordered  pair  {v,w} 
such  that  (v,w)  £  E  an  edge  of  G.  We  assume  that  the  cost  function  satisfies  the  following 
constraint  for  each  (v,w)  £  E: 

c(v,w)  =  —c(w,v)  (cost  antisymmetry  constraint).  (1) 

A  pseudoflow  is  a  function  /  :  E  — *•  R  satisfying  the  following  constraints  for  each 
(v,tv)  £  E: 

f(v,  w)  <  u(v,  w)  (capacity  constraint),  (2) 

'We  refer  to  ordered  pairs  of  vertices  as  arcs,  and  to  unordered  pairs  of  vertices  as  edges. 


/(«,  w)  =  -  f(w,  t>)  (flow  antisymmetry  constraint). 


(3) 


Remark:  The  flow  antisymmetry  constraints  imply  -u(w,  v)  <  f(v,  w)  <  u(v, «;);  this 
means  that  there  are  implied  lower  bounds  as  well  as  upper  bounds  on  flow.  In  the  usual 
formulation  of  the  minimum-cost  circulation  problem  these  lower  bounds  are  explicitly 
given.  We  have  chosen  the  present  formulation  because  it  simplifies  the  statement  of  con¬ 
straints  such  as  (2)  and  simplifies  some  proofs.  Our  results  easily  extend  to  allow  multiple 
arcs,  infinite  capacities,  and  costs  that  are  not  antisymmetric. 

For  a  pseudofiow  /  and  a  vertex  »,  the  excess  flow  into  t>,  e/(t>),  is  defined  by  e/(v)  = 
52(u ,v)eE  /(**,»).  A  vertex  v  with  e/(v)  >  0  is  called  active.  Observe  that  £e€v  */(»)  =  o. 

A  circulation  is  a  pseudofiow  /  such  that,  for  each  vertex  v, 

e/(v)  =  0  (flow  conservation  constraint).  (4) 

Observe  that  a  pseudoflow  /  is  a  circulation  if  and  only  if  there  are  no  active  vertices.  The 
cost  of  a  pseudoflow  /  is  given  by  the  following  expression: 

cost(f)  =  |  J2  c(v,w)f(v,w).  (5) 

(v,<u)€E 

(The  factor  of  1/2  appears  because  the  sum  counts  the  cost  of  the  flow  through  each  edge 
twice.)  The  minimum-cost  circulation  problem  is  that  of  finding  a  circulation  of  minimum 
cost. 

We  shall  need  several  other  concepts.  Let  G  be  a  directed  graph  with  arc  cost  function 
c.  The  length  of  a  path  or  cycle  in  G  is  the  number  of  arcs  it  contains.  The  cost  c(T)  of  a 
path  or  cycle  T  is  the  sum  of  the  costs  of  the  arcs  on  the  path  or  cycle.  The  mean  cost  of  a 
cycle  is  the  cost  of  the  cycle  divided  by  the  length  of  the  cycle. 

For  a  pseudoflow  /  and  an  arc  (v,tu),  the  residual  capacity  of  (v, w)  is  uj(v,w)  = 
u(v,w)—  f(v,w).  An  arc  (v,  tw)  is  saturated  if  u/(»,  tv)  =  0.  An  arc  (v,  tn)  is  a  residual  arc 
if  it  is  not  saturated,  i.e.  «/(»,»)  >  0.  The  residual  graph  Gj  =  (V,  Ej)  is  the  directed 
graph  with  vertex  set  V  and  arc  set  Ej  =  {(v,  u;)|«/(t;,tn)  >  0}. 

A  price  function  p  is  a  function  p  :  V  — *  R.  For  a  price  function  p,  the  reduced  cost 
of  an  arc  {v,w)  is  cf(v,  to)  =  c(»,  w)  —  p(v)  +  p(w).  The  flow  conservation  constraint  (4) 
implies  that  the  cost  of  a  minimum-cost  circulation  is  the  same  whether  the  original  costs  or 


the  reduced  costs  with  respect  to  some  price  function  are  used.  In  the  linear  programming 
formulation  of  the  minimum-cost  circulation  problem,  the  prices  are  the  dual  variables. 

2.2  Optimality  and  Approximate  Optimality 

There  are  two  classical  characterizations  of  minimum-cost  circulations,  both  of  which  we 
shall  use. 

Theorem  2.1  ([9])  A  circulation  f  is  minimum-cost  if  and  only  if  the  residual  graph  con¬ 
tains  no  cycle  of  negative  cost. 

Theorem  2.2  ([13])  A  circulation  f  is  minimum-cost  if  and  only  if  there  is  a  price  function 
p  such  that,  for  each  arc  (v,w), 

cp(v,w)  <  0  =>  f(v,w )  =  u (v,w)  ( complementary  slackness  constraint ).  (6) 

For  a  pseudoflow  /,  an  arc  (v,  w)  is  said  to  be  in  kilter  (after  the  out-of-kilter  method 
[17,41])  if  it  satisfies  (6)  and  out  of  kilter  otherwise.  Figure  la  shows  a  kilter  diagram  [36], 
which  is  a  pictorial  representation  of  the  complementary  slackness  constraints. 

We  need  a  notion  of  approximate  optimality,  which  we  obtain  by  relaxing  the  comple¬ 
mentary  slackness  constraints.  This  relaxation  was  previously  used  in  the  minimum-cost 
flow  algorithms  of  Tardos  [50]  and  Bertsekas  [4,6].  For  a  constant  e  >  0,  a  pseudoflow  /  is 
said  to  be  e-optimal  with  respect  to  a  price  function  p  if,  for  every  arc  (t\  w),  we  have 

Cp(v,  w)  <  — e  =►  f(v,  w)  =  u(v,w)  (e-optimality  constraint).  (7) 

A  pseudoflow  /  is  (-optimal  if  /  is  e-optimal  with  respect  to  some  price  function  p.  Figure 
lb  illustrates  the  concept  of  e-optimality  in  terms  of  kilter  diagrams.  A  circulation  /  is 
(-tight  if  /  is  e-optimal  and  for  any  e7  <  e,  /  is  not  e/-optimal.  A  circulation  /  is  (-tight 
with  respect  to  a  price  function  p  if  /  is  e-tight  and  /  is  e-optimal  with  respect  to  p. 

Observe  that  a  pseudoflow  /  is  e-optimal  with  respect  to  a  price  function  p  if  and  only  if 
cp(v,  w)  >  -(  for  every  residual  arc  (v,  w).  It  is  in  this  form  that  we  shall  use  the  definition. 

A  crucial  property  of  e-optimality  is  that  if  the  arc  costs  are  integers  and  e  is  small 
enough,  any  e-optimal  circulation  is  minimum-cost.  The  following  theorem  is  a  result  of 
Bertsekas. 


(a)  Kilter  diagram 


(b)  Approximate  optimality 


Figure  1: 

(a)  Kilter  diagram.  If  /  is  an  optimal  circulation,  the  point  corresponding  to  the  arc  ( v,w )  is 
on  the  heavy  curve. 

(b)  e-optimality.  If  /  is  e-optimal,  the  point  corresponding  to  the  arc  (v,  w)  can  be  in  the  shaded 
rectangle  as  well  as  on  the  heavy  curve. 

Theorem  2.3  ([4])  If  all  coats  arc  integers  and  e  <  1/n,  then  an  e-optimal  circulation  f  is 
minimum-cost. 

Proof:  Consider  a  simple  cycle  in  Gj.  The  e-optimality  of  /  implies  that  the  reduced  cost 
of  the  cycle  is  at  least  ne  >  -1.  The  reduced  cost  of  the  cycle  equals  its  original  cost,  which 
must  be  integral  and  hence  nonnegative.  Theorem  2.1  implies  that  /  is  minimum-cost.  | 

3  Fitting  Price  Functions  and  Tight  Error  Parameters 

The  definition  of  e-optimality  motivates  the  following  two  problems: 
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1.  Given  a  pseudoflow  /  and  a  constant  t  >  0,  find  a  price  function  p  such  that  /  is 
c-optimal  with  respect  to  p,  or  show  that  there  is  no  such  price  function  (i.e.  /  is  not 
c-optimal). 

2.  Given  a  pseudoflow  /,  find  the  c  such  that  /  is  c- tight  (i.e.  the  smallest  e  >  0  such 
that  /  is  c-optimal). 

Solutions  to  these  problems  are  needed  in  our  strongly  polynomial  minimum-cost  circulation 
method  and  can  be  used  to  give  heuristic  improvements  in  our  weakly  polynomial  method. 
The  problem  of  finding  an  optimal  price  function  given  an  optimal  circulation  is  the  special 
case  of  problem  1  with  c  =  0.  In  this  section  we  shall  show  that  both  problems  can  be 
solved  in  0(nm)  time,  the  first  by  doing  a  Bellman-Ford  shortest  path  computation  (see 
e.g.  [52]),  the  second  by  applying  an  algorithm  of  Karp  [34]  for  computing  the  minimum 
mean  cost  of  a  cycle  in  a  directed  graph  with  arc  costs.  We  also  show  that  an  important 
special  case  of  problem  1  arising  in  our  weakly  polynomial  method  can  be  solved  in  0(m) 
time,  by  using  the  Dial- Wagner  implementation  [10,55]  of  Dijkstra’s  shortest  path  algorithm 
(see  e.g.  [52]). 

To  address  these  problems,  we  need  some  results  about  shortest  paths  and  shortest  path 
trees  (see  e.g.  [52]).  Let  G  be  a  directed  graph  with  a  distinguished  source  vertex  s  and  a 
cost  c(v,tv )  on  every  arc  (v,  to).  For  a  spanning  tree  T  rooted  at  s,  the  tree  cost  function 
d  :  V  — *•  R  is  defined  recursively  as  follows:  d(s)  =  0,  d(v )  =  d(parent(v ))  +  c(parent(v),v) 
for  v  £  V  -  {s},  where  parent(v)  is  the  parent  of  v  in  T.  A  spanning  tree  T  rooted  at 
s  is  a  shortest  path  tree  if  and  only  if,  for  every  vertex  v,  the  path  from  s  to  v  in  T  is  a 
minimum-cost  path  from  s  to  t;  in  G,  i.e.  d(v)  is  the  cost  of  a  minimum-cost  path  from  s 
to  v. 

Lemma  3.1  (see  e.g.  [52])  Graph  G  contains  a  shortest  path  tree  if  and  only  if  G  contains 
no  negative-cost  cycle. 

Lemma  3.2  (see  e.g.  [52])  A  spanning  tree  T  rooted  at  s  is  a  shortest  path  tree  if  and  only 
if  c(v,w)  -1-  d(v)  >  d(w)  for  every  arc  (v,w)  in  G. 

3.1  Finding  a  Fitting  Price  Function 

Consider  problem  1:  given  a  pseudoflow  /  and  a  nonnegative  €,  find  a  price  function  p 
with  respect  to  which  /  is  c-optimal,  or  show  that  /  is  not  c-optimal.  Define  a  new  cost 
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function  cW  :£-*■  R  by  c^(v,w)  =  c(v,w)  +  e.  Extend  the  residual  graph  G/  by 
adding  a  single  vertex  s  and  arcs  from  it  to  all  other  vertices  to  form  an  auxiliary  graph 
Gaux  ~  (Va*»x>  Eaux)  =  (VU{«}(JE/U({i}xV)).  Extend  to  Gau®  by  defining  c(e)(s,u)  =  0 
for  every  arc  («,»),  where  v  €  V.  Note  that  every  vertex  is  reachable  from  s  in  Gaux. 

Theorem  3.3  Pseudoflow  f  is  e-optimal  if  and  only  ifGaux  (or  equivalently  Gf)  contains 
no  cycle  of  negative  c^-cost.  IfT  is  any  shortest  path  tree  of  Gaux  (rooted  at  s)  with  respect 
to  the  arc  cost  function  c^*\  and  d  is  the  associated  tree  cost  function,  then  f  is  e-optimal 
with  respect  to  the  price  function  p  defined  by  p(v)  =  -d(v)  for  all  v  €  V. 

Proof:  Suppose  /  is  e-optimal.  Any  cycle  in  Gaux  is  a  cycle  in  Gf,  since  vertex  s  has 
no  incoming  arcs.  Let  T  be  a  cycle  of  length  l  in  Gaux-  Then  c(r)  >  -It,  which  implies 
c(‘)(r)  =  e(r)  +  le  >  0.  Therefore  Gaux  contains  no  cycle  of  negative  c^-cost. 

Suppose  Gaux  contains  no  cycle  of  negative  ci*)- cost.  Then  by  Lemma  3.1  Gaux  has 
some  shortest  path  tree  rooted  at  s.  Let  T  be  any  such  tree  and  let  d  be  the  tree  cost 
function.  By  Lemma  3.2,  c(‘)(u,  tu)  +  d(t>)  >  d(w)  for  all  (v,w)  e  Ej ,  which  is  equivalent  to 
cW(t>,  w)  -f  d(v)  —  d(w)  >  — c  for  all  (v,  w)  €  Ej.  But  these  are  the  e-optimality  constraints 
for  the  price  function  p  =  —d.  Thus  /  is  e-optimal  with  respect  to  p.  | 

Remark:  Theorem  3.3  holds  for  any  assignment  of  costs  to  the  arcs  (s,  v).  We  have  assigned 
zero  costs  to  these  edges  merely  for  simplicity. 

Using  Theorem  3.3,  we  can  solve  problem  1  by  constructing  the  auxiliary  graph  GOUx 
and  finding  either  a  shortest  path  tree  or  a  negative-cost  cycle.  Constructing  Gaux  takes 
0(m)  time.  Finding  a  shortest  path  tree  or  a  negative-cost  cycle  takes  O(nm)  time  using 
the  Bellman-Ford  shortest  path  algorithm  (see  e.g.  [52]). 

An  important  special  case  of  problem  1  arises  in  connection  with  Theorem  2.3.  Suppose 
the  arc  cost  function  c  is  integral,  /  is  a  pseudoflow  c-optimal  with  respect  to  a  price  function 
P<,  and  e  <  1/n.  By  Theorem  2.3,  /  is  optimal.  The  question  arises  of  how  to  compute  a 
price  function  p  with  respect  to  which  /  is  optimal.  Knowing  pf  helps  to  find  such  a  price 
function;  we  shall  show  how  this  computation  can  be  done  using  Dijkstra’s  shortest  path 
algorithm. 

We  use  a  variant  of  the  previous  construction.  Define  Gaui  as  above: 

Gaux  =  (Kiurt  Ea ux)  =  (V  U  {■#},  Ej  U  ({s}  X  V^)). 
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Extend  pt  and  c  to  Gaux  by  defining  pe(s)  =  0,  c(s,v )  =  -[p«(t>)J  for  all  v  E  V.  Define  a 
new  cost  function  c!  on  Eaux  as  follows: 

c'(v,  W)  =  Cpt(v,  W)  +  €  =  C(V ,  W)  -  pt(v)  +  pc(w)  +  €. 

Observe  that  £-optimality  implies  c'(v,  w)  >  0  for  all  (v,w)  E  Ef,  and  the  definition  of 
c'  implies  that  0  <  c^s,  v)  <  1  +  £  for  all  v  E  V. 

Theorem  3.4  Let  T  be  a  shortest  path  tree  in  Gaux  (rooted  at  s)  with  respect  to  the  cost 
function  cf .  Let  dt  be  the  tree  cost  function  on  T  with  respect  to  the  arc  cost  function  cPt. 
Then  f  is  optimal  with  respect  to  the  price  function  p  defined  by  p(v)  =  pe(t>)  —  d((v). 

Proof:  Let  (v,w)  €  Ef.  We  must  prove  that  c(v,  w)  -  pe(v)  +  d((v)  +  Pt(«0  -  de(w)  >  0, 
i.e.  cPt(v,w)  +  de(v)  >  de(w).  Let  d!  be  the  tree  cost  function  with  respect  to  the  arc  cost 
function  c'.  The  definitions  of  c'  and  d!  imply  that  c'(v,  to)  =  cPt(v,  to)+ne,  d'(v)  <  dc(v)+n e, 
and  d'(w )  >  de(w)  +  e.  Combining  these  with  the  inequality  c'(v,  to)  +  d'(v)  >  d'(w)  given 
by  Lemma  3.2,  we  find  that 

cPt(v,  to)  +  dt(v )  >  dt(w)  -  ne  >  df(to)  -  1. 

But  the  integrality  of  c  and  the  definitions  of  cPt  and  dt  imply  that  Cpe(v,w)  +  dc(v)  = 
i  +  pe(w)  and  de(w)  =  j  +  p€(w)  for  some  integers  i  and  j.  Since  i  +  pe(w)  >  j  +  pe(to)  -  1 
implies  that  i  >  j ,  we  must  have  Cp,(v,  to)  +  dt(v)  >  dc(w).  | 

By  Theorem  3.4,  we  can  compute  a  price  function  p  with  respect  to  which  /  is  optimal 
by  solving  a  single  source  shortest  path  problem  on  a  graph  with  nonnegative  arc  costs 
and  doing  O(m)  additional  work  (to  construct  Ga ux  and  to  compute  p  given  a  shortest 
path  tree).  We  can  solve  the  shortest  path  problem  in  0(m  +  nlogn)  time  with  Dijkstra’s 
algorithm  implemented  using  Fibonacci  heaps  [15]. 

If  1/e  is  an  integer  of  size  0(n)  and,  for  every  r,  pt(v)  is  an  integer  multiple  of  e ,  then 
the  time  to  solve  the  shortest  path  problem  can  be  reduced  to  0(m).  In  this  case,  c'(v,  w) 
is  an  integer  multiple  of  e  for  every  (v,  w).  Since  in  this  case  c'(s,  v)  <  1  for  every  v  E  V,  it 
follows  that  d’(v)  <  1.  Thus  if  all  edge  costs  are  multiplied  by  1/c,  the  result  is  a  shortest 
path  problem  with  nonnegative  integer  arc  costs  such  that  all  minimum-cost  paths  from  s 
have  cost  0(n).  The  algorithm  of  Dial  [10]  and  Wagner  [55]  will  solve  such  a  problem  in 
0(m)  time. 


3.2  Finding  a  Tight  Error  Parameter 


Let  os  turn  to  problem  2:  given  a  pseudoflow  /,  find  the  e  such  that  /  is  e-tight.  We 
need  a  definition.  For  a  directed  graph  G  with  arc  cost  function  c,  the  minimum  cycle 
mean  of  G,  denoted  by  p(G,  c),  is  the  minimum  mean  cost  of  a  cycle  in  G.  Karp  [34]  has 
given  an  O(nm)  algorithm  to  compute  p(G,c).  This  problem  is  a  special  case  of  the  so- 
called  “tramp  steamer”  problem;  see  [36]  for  additional  references.  The  connection  between 
minimum  cycle  means  and  tight  error  parameters  is  given  by  the  following  theorem: 


Theorem  3.5  For  a  pseudoflow  f,  the  c  for  which  f  is  e-tight  is  e  =  max{0,  -/z(G/,c)}. 


Proof:  Consider  any  cycle  T  in  Gj.  Let  the  length  of  T  be  l.  For  any  c,  let  be  the  cost 
function  defined  in  Section  3.1:  cM( t>,  w)  =  c(v,  w)  +  e  for  (r,  w)  €  Ef.  Let  e  be  such  that  / 
is  c-tight,  and  let  p  =  p(Gj,c).  By  Theorem  3.3,  0  <  c(t)(r)  =  c(r)  +  1c,  i.e.  c(T)//  >  -c. 
Since  this  is  true  for  any  cycle  T,  p  >  -c,  i.e.  e  >  -p.  Conversely,  for  any  cycle  T, 
c(T)/l  >  p,  which  implies  c^“M)(r)  >  0.  By  Theorem  3.3,  this  implies  max{0,  —p}  >  c.  | 

By  Theorem  3.5  a  tight  error  parameter  can  be  computed  in  O(nm)  time  by  applying 
Karp ’8  minimum  cycle  mean  algorithm. 

We  conclude  this  section  with  an  observation  regarding  the  structure  of  c-tight  pseud¬ 
oflows.  Suppose  /  is  an  c-tight  pseudoflow  and  c  >  0.  Let  p  be  a  price  function  such  that 
/  is  c-optimal  with  respect  to  p,  and  let  T  be  a  cycle  in  Gj  with  mean  cost  -c.  Since  -c  is 
a  lower  bound  on  the  reduced  cost  of  an  arc  in  G /,  every  arc  of  F  must  have  reduced  cost 
exactly  c. 


4  The  Minimum-Cost  Flow  Framework 

In  this  section  we  give  a  high-level  description  of  our  minimum-cost  circulation  algorithms. 
First  we  describe  a  successive  approximation  framework.  The  running  times  of  the  algo¬ 
rithms  developed  within  this  framework  depend  on  C,  the  biggest  absolute  value  of  the  costs. 
Then  we  show  that  a  natural  modification  of  this  framework  leads  to  strongly  polynomial 
algorithms. 
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procedure  min-coat(V,  E,u,c); 

[initialization] 

« «-  C; 

Vt>,  p(v)  *—  0; 

if  3  a  circulation  then  /  «—  a  circulation  else  return(null); 
[loop] 

while  £  >  1/n  do 

(f./.P)  —  rtfine(t,f,py, 
return(/); 

end. 


Figure  2:  The  successive  approximation  algorithm.  Versions  of  refine  described  in  this  paper 
decrease  c  by  a  factor  of  two. 

4.1  The  Successive  Approximation  Framework 

In  this  section  we  give  a  high-level  description  of  our  successive  approximation  algorithm  (see 
Figure  2).  We  assume  here  that  the  arc  cost  function  is  integral.  The  algorithm  maintains 
a  circulation  ft  and  a  price  function  pt,  such  that  ft  is  e-optimal  with  respect  to  pt.  The 
algorithm  starts  with  e  =  C  (or  alternatively  e  =  2flo*»cl),  with  p(v)  =  0  for  all  veV,  and 
with  any  initial  circulation.  An  initial  circulation  can  be  found  using  one  invocation  of  any 
maximum  flow  algorithm.  (See  e.g.  [44],  problem  11(e),  p.  215.)  Any  initial  circulation  is 
C-optimal.  The  algorithm  maintains  the  invariant  that  the  circulation  /  is  e-optimal  with 
respect  to  the  price  function  p.  The  main  loop  of  the  algorithm  repeatedly  reduces  the  error 
parameter  e.  When  e  <  1/n,  the  current  circulation  is  minimum-cost,  and  the  algorithm 
terminates. 

Reducing  the  error  in  the  optimality  of  the  current  circulation  is  the  task  of  the  sub¬ 
routine  refine.  The  input  to  refine  is  an  error  parameter  e,  a  circulation  /,  and  a  price 
function  p  such  that  /  is  e-optimal  with  respect  to  p.  The  output  from  refine  is  a  reduced 
error  parameter  e,  a  new  circulation  /,  and  a  new  price  function  p  such  that  /  is  e-optimal 
with  respect  to  p.  The  implementations  of  refine  described  in  this  paper  reduce  the  error 
parameter  e  by  a  constant  factor  (two  unless  mentioned  otherwise). 

The  correctness  of  the  algorithm  is  immediate  from  Theorem  2.3,  assuming  that  refine 
is  correct.  The  number  of  iterations  of  refine  is  O(log( nC)).  The  initialization  is  dominated 
by  the  time  to  compute  a  circulation,  which  is  O(nmlog(n3/m))  [29].  The  bounds  we  shall 
obtain  for  various  versions  of  refine  are  all  ft(mn  log( nJ/m)).  Thus  the  iterations  of  refine 
dominate  the  running  time  of  the  algorithm,  giving  us  the  following  theorem: 
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Theorem  4.1  A  minimum-coat  circulation  can  be  computed  in  the  time  required  for 
0(log(nC))  iterations  of  refine,  if  refine  reduces  €  by  a  factor  of  two. 


Remark:  The  minimum-cost  circulation  algorithm  need  not  begin  with  a  circulation;  it  can 
begin  with  any  pseudoflow.  If  the  problem  is  infeasible,  this  fact  will  be  discovered  during 
the  flrst  invocation  of  refine.  Beginning  with  a  circulation  simplifies  the  presentation  below, 
because  we  do  not  have  to  worry  about  feasibility. 

Remark:  Some  applications  require  an  optimal  price  function  in  addition  to  an  optimal 
circulation.  Once  an  optimal  circulation  /  is  found,  a  price  function  p  with  respect  to 
which  /  is  optimal  can  be  computed  in  0(m)  time  as  described  in  Section  3.1,  assuming 
that  the  initial  value  of  e  was  chosen  to  be  a  power  of  two. 

We  conclude  this  section  by  comparing  our  approximation  approach  with  the  traditional 
cost-scaling  algorithms  of  Rock  [45]  and  of  Bland  and  Jensen  [7].  These  algorithms  use  a 
maximum  flow  routine  in  the  inner  loop.  Our  algorithms  are  similar  to  these  cost-scaling  al¬ 
gorithms  in  that  refine,  as  we  implement  it,  can  be  viewed  as  a  generalization  of  a  maximum 
flow  algorithm.  The  previous  algorithms,  however,  require  O(n)  maximum- flow  computa¬ 
tions  to  halve  the  error  parameter,  whereas  our  algorithm  requires  only  one  invocation  of 
refine.  This  accounts  for  our  improved  time  bounds. 

The  traditional  cost-scaling  approach,  where  precision  of  the  costs  is  increased  bit-by-bit, 
can  be  used  instead  of  our  successive  approximation  approach.  We  used  traditional  scaling 
in  the  first  versions  of  our  algorithm;  later,  traditional  scaling  was  used  by  Bertsekas  and 
Eckstein  [5].  Suppose  that  the  current  precision  of  costs  is  e  and  we  have  an  e-optimal 
circulation.  If  we  add  the  next  bit  of  costs,  the  circulation  is  (3/2)c-optimal  with  respect 
to  the  resulting  cost  function.  Applying  a  version  of  the  refine  subroutine  that  reduces 
the  error  parameter  by  a  factor  of  three,  we  obtain  an  e/2-optimal  circulation.  Repeating 
this  step  0(log(nC))  times,  we  obtain  an  optimal  circulation.  Asymptotic  bounds  for  this 
approach  are  the  same  as  for  the  successive  approximation  approach,  but  the  constant 
factors  are  somewhat  worse  because  of  the  additional  errors  introduced  when  new  bits  of 
costs  are  added.  A  potential  advantage  of  this  approach  is  that  a  smaller  number  of  bits 
must  be  considered  when  manipulating  the  costs.  For  most  practical  applications,  however, 
the  costs  are  small  and  word  operations  can  be  used  on  them. 


procedure  min-cost(V,  E,  ti,  c); 

[initialization] 

Vv,  p(v)  «-  0; 

if  3  a  circulation  /  then  f  *—  a  circulation  else  return(null); 
[loop] 

while  €  >  0  do  begin 

(*)  find  A  and  p\  such  that  /  is  A-tight  with  respect  to  pxi 
if  A  >  0  then  (e,/,p)  «-  rtfine(\,  f,p\) 
else  return(/); 

end. 


Figure  3:  The  strongly  polynomial  algorithm.  Versions  of  refine  described  in  this  paper  decrease 
€  by  a  factor  of  two. 


4.2  The  Strongly  Polynomial  Framework 

The  minimum-cost  circulation  algorithm  of  Section  4.1  has  the  disadvantage  that  the  num¬ 
ber  of  iterations  of  refine  depends  on  the  magnitudes  of  the  costs.  If  the  costs  are  huge 
integers,  the  method  need  not  run  in  time  polynomial  in  n  and  m;  if  the  costs  are  irrational, 
the  method  need  not  even  terminate.  In  this  section  we  show  that  a  natural  modification 
of  our  successive  approximation  approach  produces  strongly  polynomial  algorithms,  i.e. 
algorithms  running  in  time  polynomial  in  n  and  m,  assuming  arbitrary  real  numbers  as 
capacities  and  costs  and  infinite-precision  arithmetic.  The  running  time  bounds  we  shall 
derive  for  algorithms  based  on  the  approach  of  Section  4.1  are  valid  for  the  modified  ap¬ 
proach  presented  in  this  section.  Unlike  the  previous  strongly  polynomial  algorithms,  our 
approach  uses  no  rounding:  true  capacities  and  costs  are  used  throughout. 

The  changes  needed  to  make  our  approach  strongly  polynomial  are  to  add  an  extra 
computation  to  the  main  loop  of  the  algorithm  and  to  change  the  termination  condition. 
Before  calling  refine  to  reduce  the  error  parameter  e,  the  new  method  computes  the  value 
A  and  a  price  function  p\  such  that  the  current  circulation  /  is  A-tight  with  respect  to  p\. 
The  strongly  polynomial  method  is  described  on  Figure  3.  The  value  of  A  and  the  price 
function  p\  in  line  (+)  are  computed  as  described  in  Section  3.  The  algorithm  terminates 
when  the  circulation  /  is  optimal,  i.e.  A  =  0. 

The  time  to  perform  line  (*)  is  O(nm)  by  the  results  of  Section  3.  Since  all  the  im¬ 
plementations  of  refine  that  we  shall  consider  have  a  time  bound  greater  than  O(nm),  the 
extra  time  per  iteration  in  the  new  version  of  the  algorithm  exceeds  the  time  per  iteration 
in  the  original  version  by  less  than  a  constant  factor.  Since  each  iteration  at  least  halves  e, 


the  bound  of  0(k>g(nC))  on  the  number  of  iterations  derived  in  Section  4.1  remains  valid, 
assuming  that  the  costs  are  integral. 

For  arbitrary  real-valued  costs,  we  shall  derive  a  bound  of  O(mlogn)  on  the  number 
of  iterations,  assuming  that  the  refine  subroutine  used  decreases  the  error  parameter  by  a 
constant  factor.  We  start  by  proving  the  following  theorem,  which  is  a  slight  generalization 
of  a  lemma  of  Tardos  [50].  (To  obtain  the  lemma  of  Tardos,  take  d  =  0.) 

Theorem  4.2  Let  e  >  0,  d  >  0  be  constants.  Suppose  that  a  circulation  f  is  (-optimal 
with  respect  to  a  price  function  p,  and  that  there  is  an  arc  (r,  w)  £  E  such  that 
n(e  +  d).  Then  for  any  d -optimal  circulation  f,  we  have  f(v,w)  =  /'(t>,  w). 

Proof ;  By  antisymmetry,  it  is  enough  to  prove  the  theorem  for  the  case  cp(v,  w)  >  n(c  +  e'). 
Let  f  be  a  circulation  such  that  /'(»,to)  /  /(»,  to).  Since  Cp{v,w)  >  (,  the  flow  through 
the  arc  (o,to)  must  be  as  small  as  the  capacity  constraints  allow,  namely  -tt(w,t>),  and 
therefore  /'(o,  w)  ^  f(v,w)  implies  /'(»,  to)  >  /(«,  w).  We  show  that  /'  is  not  ('-optimal, 
and  the  theorem  follows. 

Consider  G>  =  {(*,y)  €  E\f(z,y)  >  /(*,y)}.  Note  that  G>  is  a  subgraph  of  G/,  and 
(®,  to)  is  an  arc  of  6r>.  Since  /  and  f  are  circulations,  G>  must  contain  a  simple  cycle  T 
that  passes  through  (u,  to).  Let  /  be  the  length  of  T.  Since  all  arcs  of  T  axe  residual  arcs, 
the  cost  of  T  is  at  least 

Cp(o,to)  -  (/  -  l)c  >  n(f  +  d)  -  (n  -  l)c  >  nd. 

Now  consider  a  cycle  F  obtained  by  reversing  the  arcs  on  T.  Note  that  F  is  a  cycle  in 
G<  =  {(*iV)  €  E\f'(x,y)  <  /(x,y)}  and  therefore  a  cycle  in  Gj>.  By  antisymmetry,  the 
cost  of  F  is  less  than  -nd  and  thus  the  mean  co6t  of  T  is  less  than  -d.  Theorem  3.5  implies 
that  /'  is  not  ('-optimal.  | 

To  state  an  important  corollary  of  Theorem  4.2,  we  need  the  following  definition.  We 
say  that  an  arc  (®,  to)  €  E  is  (-fixed  if  the  flow  through  this  arc  is  the  same  for  all  e-optimal 
circulations. 

Corollary  4.3  Let  e  >  0,  suppose  f  is  (-optimal  with  respect  to  a  price  function  p,  and 
suppose  that  for  some  arc  (v,  to),  |c,,(w,  to)|  >  2ne.  Then  (o,  to)  is  (-fixed. 

Define  Ft  to  be  the  set  of  (-fixed  arcs. 


Lemma  4.4  Assume  d  <  Suppose  that  there  exists  an  e-tight  circulation  f.  Then  Ft> 
properly  contains  Ft. 


Proof:  Since  every  e'-optimal  circulation  is  e-optimal,  we  have  Ft  C  Ft>.  To  show  that  the 
containment  is  proper,  we  have  to  show  that  there  is  an  c'-fixed  arc  that  is  not  f -fixed. 

Since  the  circulation  /  is  e-tight,  there  exists  a  price  function  p  such  that  /  is  e-optimal 
with  respect  to  p,  and  there  exists  a  simple  cycle  T  in  G/  with  a  mean  cost  of  -e  (see 
Section  3.2).  Since  increasing  /  along  T  preserves  e-optimality,  arcs  of  T  are  not  e-fixed. 

We  show  that  at  least  one  arc  of  T  is  e'-fixed.  Let  /'  be  a  circulation  that  is  e'-optimal 
with  respect  to  some  price  function  pf.  Since  the  mean  cost  of  T  is  — e,  there  is  an  arc  (t>,  w) 
of  T  with  Cpi(v,w)  <  —  e  <  -2nd;  by  antisymmetry  cp>(w,  v)  >  2nd.  By  Corollary  4.3,  the 
arc  (t?,  w)  is  e'-fixed.  But  (»,tn)  is  not  e-fixed  since  ( v,w )  is  an  arc  of  T.  | 

As  the  strongly  polynomial  algorithm  runs,  the  value  of  the  error  parameter  e  decreases; 
each  time  this  value  decreases  by  a  factor  of  2n,  at  least  one  arc  becomes  fixed.  If  all 
arcs  are  fixed  with  respect  to  the  current  value  of  e,  the  current  circulation  is  optimal  and 
the  algorithm  terminates  during  the  next  execution  of  line  (*).  (Since  not  every  arc  is 
necessarily  0- fixed,  the  algorithm  may  terminate  before  all  arcs  are  fixed.) 

Theorem  4.5  If  refine  reduces  the  error  parameter  e  by  a  constant  factor,  the  total  number 
of  iterations  of  the  while  loop  is  0(m  log  n). 

Proof:  Consider  a  time  during  the  execution  of  the  algorithm.  During  the  next  O(logn) 
iterations,  either  the  algorithm  terminates,  or  the  error  parameter  is  reduced  by  a  factor 
of  2n.  In  the  latter  case,  Lemma  4.4  implies  that  an  arc  becomes  fixed.  If  all  arcs  become 
fixed,  the  algorithm  terminates  in  one  iteration  of  the  loop.  Therefore  the  total  number  of 
iterations  is  0(m  log  n).  | 

Section  7  contains  a  version  of  our  minimum-cost  circulation  algorithm  that  runs  in 
0(nmlog(n2/m)min{log(nC),  mlogn})  time.  If  the  arc  costs  are  huge,  the  strongly  poly¬ 
nomial  algorithm  of  Gali!  and  Tardos  [23],  which  runs  in  O(n2(logn)(nlogn  +  m))  time,  is 
faster  than  our  algorithm  except  on  sparse  graphs.  Perhaps  our  algorithm  can  be  improved 
to  run  in  0(n  log  n)  iterations,  possibly  by  using  some  of  the  ideas  in  the  Galil-Tardos  paper. 
If  this  conjecture  is  true,  our  method  would  be  competitive  with  that  of  Galil  and  Tardos 
on  both  sparse  and  dense  graphs,  and  slower  by  a  factor  of  log  n  on  graphs  of  intermediate 
density. 
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procedure  rtfine((,  f,  p); 

[initialization] 
c  — e/2; 

V(t>,  w)  €  E  do  if  Cp(v,  w)  <  0  then  f(v,  w)  *—  u( v,  w); 

[loop] 

while  3  a  flow  or  price  update  operation  that  applies  do 
select  such  an  operation  and  apply  it; 
retum(c,/,p); 

end. 

Figure  4:  The  generic  refine  subroutine. 

5  A  Generic  Refinement  Algorithm 

In  this  section  we  describe  an  implementation  of  refine  that  is  a  generalization  of  our 
maximum  flow  algorithm  [27,29].  We  call  this  the  generic  implementation.  The  generic 
refine  subroutine  is  a  slight  variant  of  the  earlier  minimum-cost  flow  algorithm  of  Bertsekas 
[4].  We  use  the  subroutine  in  a  different  way,  however.  Whereas  we  use  the  subroutine 
repeatedly,  each  time  to  halve  the  error  parameter  e,  Bertsekas  computes  a  minimum-cost 
flow  by  choosing  an  e  small  enough  that  a  single  run  of  refine  will  convert  an  arbitrary 
circulation  into  an  optimal  circulation.  Unfortunately,  the  latter  approach  results  in  only 
a  pseudopolynomial-time  algorithm,  whereas  we  shall  derive  small  polynomial  time  bounds 
for  various  versions  of  our  algorithm. 

As  we  have  mentioned  in  Section  4.1,  the  input  and  output  to  refine  is  a  triple  (e,  /e,p<). 
The  main  effect  of  refine  is  to  reduce  e  by  a  factor  of  two. 

The  generic  refine  subroutine  is  described  on  Figure  4.  It  begins  by  halving  e  and 
saturating  every  arc  with  negative  reduced  cost.  This  converts  the  circulation  /  into  an 
e-optimal  pseudoflow  (indeed,  into  a  0-optimal  pseudoflow).  Then,  the  subroutine  converts 
the  (-optimal  pseudoflow  into  an  (-optimal  circulation  by  applying  a  sequence  of  flow  and 
price  update  operations ,  each  of  which  preserves  (-optimality.  The  idea  of  preserving  c- 
optimality  in  this  context  is  due  to  Bertsekas  [4]. 

The  inner  loop  of  our  generic  algorithm  consists  of  repeatedly  applying  the  two  kinds 
of  update  operations  described  in  Figure  5,  in  any  order,  until  no  such  operation  applies. 
Figure  6  illustrates  the  update  operations  in  terms  of  kilter  diagrams. 

A  push  operation  applies  to  a  residual  arc  (v,  w)  of  negative  reduced  cost,  such  that 
vertex  v  is  active.  It  consists  of  pushing  6  =  min{c/(v),  u/(r,  u>)}  units  of  flow  from  v  to 


push(v,w). 

Applicability:  v  is  active,  uj(v,w)  >  0,  and  cp(v,w)  <  0. 

Action:  send  6  —  min(e/(v),u/(v,u>))  units  of  flow  from  u  to  w. 

rtlabtl(v). 

Applicability:  v  is  active  and  Vu<  €  V  uj(v,  w)  >  0  =►  cp( v,  w)  >  0. 

Action:  replace  p(v)  by  min(t,  w)€B/(p(u;)  +  c(t>,  u>)  +  c). 

Figure  5:  The  flow  and  price  update  operations. 

w,  thereby  decreasing  e/(v)  and  /( w,v)  by  6  and  increasing  eflw)  and  f(v,w )  by  6.  The 
push  is  saturating  if  uj(v,w)  =  0  after  the  push  and  nonsaturating  otherwise. 

A  relabel  operation  applies  to  an  active  vertex  v  that  has  no  exiting  residual  arcs  with 
negative  reduced  cost.  It  consists  of  increasing  p(v)  to  the  highest  value  allowed  by  the 
c-optimality  constraints,  namely  min(UiU))€£/{c(v,ty)  +  p(w)  +  e}.  (If  such  a  minimum  is 
taken  over  the  empty  set,  the  problem  is  infeasible.) 

The  following  lemmas,  which  generalize  corresponding  results  of  [4,29],  give  important 
properties  of  the  update  operations. 

Lemma  5.1  If  the  problem  is  feasible,  a  pseudoflow  f  is  e-optimal  tvith  respect  to  a  price 
function  p,  and  v  is  an  active  vertex,  then  either  a  push  is  applicable  to  some  arc  (v,  w)  or 
a  relabeling  is  applicable  to  v. 

Proof:  Obvious.  | 

Remark:  If  the  problem  is  infeasible  and  we  begin  with  any  pseudoflow,  then  during  the  first 
iteration  of  refine  there  will  be  a  vertex  v  with  positive  excess  and  no  outgoing  residual  arcs. 
We  can  use  this  fact  to  detect  infeasibility  (instead  of  using  a  maximum  flow  subroutine  as 
described  in  Section  4.1). 

Lemma  5.2  A  push  preserves  the  e-optimality  of  a  pseudoflow. 

Proof :  Obvious.  | 

Lemma  5.3  Suppose  f  is  an  e-optimal  pseudoflow  with  respect  to  a  price  function  p  and 
a  relabeling  is  applied  to  a  vertex  v.  Then  the  price  of  v  increases  by  at  least  e  and  the 
pseudoflow  f  is  e-optimal  with  respect  to  the  new  price  function  p' . 
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Figure  6:  Operation  push(v,  to)  is  applicable  if  e/(v)  >  0  and  (t>,  to)  is  in  the  shaded  rectangle 
(including  the  heavy  curve).  If  e/(v)  >  0  but  for  all  to  push  does  not  apply  to  (t>,to),  then 
relabel  applies  to  v.  Increasing  the  price  of  v  corresponds  to  shifting  the  kilter  diagram  down. 
The  absence  of  residual  arcs  in  the  shaded  rectangle  guarantees  that  the  price  of  v  increases  by 
at  least  e. 

Proof:  Before  the  relabeling,  c9(v,w)  >  0  for  all  (v,  to)  €  £/,  i.e.  cp{v,w)  +  p( to)  >  p{ v) 
for  all  (t>,  to)  €  Ej.  Thus  j/(t>)  =  min(ViW)6£/  {c(t>,  to)  +  p(w)  +  c)  >  p(v)  +  e. 

To  prove  the  second  part  of  the  lemma,  observe  that  the  only  residual  arcs  whose  reduced 
costs  are  affected  by  the  relabeling  are  those  of  the  form  (t>,  to)  or  (to,  v).  Any  arc  of  the  form 
(to,  v )  has  its  reduced  cost  increased  by  the  relabeling,  preserving  its  f-optimality  constraint. 
Consider  a  residual  arc  (v,  to).  By  the  definition  of  pf,  pf(v)  <  c(v,  to)  +  p(w)  +  e.  Thus 
cp*(o,to)  =  e(v,to)  -  j/(v)  +  p{ to)  >  -€,  which  means  that  (o,  v)  satisfies  its  c-optimality 
constraint.  | 

We  can  now  establish  the  correctness  of  the  refine  subroutine,  and  hence  of  the  entire 
algorithm. 
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Theorem  5.4  If  refine  terminates,  the  pseudoflow  f  it  returns  is  an  (-optimal  circulation. 

Proof :  The  initial  pseudoflow  is  e-optimal.  The  update  operations  maintain  e-optimality 
by  Lemmas  5.2  and  5.3.  By  Lemma  5.1,  refine  can  terminate  only  if  there  are  no  active 
vertices,  which,  as  noted  in  Section  2,  implies  that  /  is  a  circulation.  | 

Remark:  One  can  obtain  a  variant  of  the  refine  subroutine  by  choosing  a  constant  7  such 
that  0  <  7  <  e,  redefining  push  to  be  applicable  when  v  is  active,  Uf(v,w)  >  0,  and 
Cp(v,w )  <  —7,  and  redefining  relabel  to  be  applicable  when  v  is  active  and  u/(v,  w)  >  0 
implies  cp( v,w)  >  —7.  The  bounds  we  derive  on  the  original  version  of  refine  hold  for  this 
variant,  with  different  constant  factors  depending  on  7  (see  [26]).  We  have  been  unable  to 
obtain  any  asymptotic  improvements  by  choosing  a  value  of  7  other  than  zero. 

Next  we  analyze  the  number  of  update  operations  that  can  take  place  during  an  ex¬ 
ecution  of  the  generic  implementation  of  refine.  We  begin  with  a  few  definitions.  For  a 
pseudoflow  /  and  a  price  function  p,  we  call  an  arc  (t>,  w)  admissible  if  u /  (v,  w)  >  0  and 
cp(v,  w )  <  0.  Note  that  a  push  operation  is  applicable  to  an  arc  (v,  w)  if  and  only  if  v  is 
active  and  (v,  w)  is  admissible.  The  admissible  graph  is  the  graph  Ga  =  (V,  Ea)  such  that 
Ea  is  the  set  of  admissible  arcs. 

As  refine  executes,  the  admissible  graph  changes.  An  important  invariant  is  that  the 
admissible  graph  remains  acyclic,  as  was  observed  by  Bertsekas  [4]  for  the  case  when  costs 
are  integers  and  e  <  1/n.  To  prove  this  fact  for  arbitrary  real  costs  and  arbitrary  e,  we  need 
the  following  lemma. 

Lemma  5.5  Immediately  after  a  relabeling  is  applied  to  a  vertex  v,  no  admissible  arcs  enter 
v. 

Proof:  Let  (u,  v)  be  a  residual  arc.  Before  the  relabeling,  cp(u,  t;)  >  -c  by  e-optimality.  By 
Lemma  5.3,  the  relabeling  increases  p(v),  and  hence  Cp(u,  v),  by  at  least  e.  Thus  cp(ti,  v)  >  0 
after  the  relabeling.  | 

Corollary  5.6  Throughout  the  running  of  refine,  the  admissible  graph  is  acyclic. 

Proof :  Initially  the  admissible  graph  contains  no  arcs  and  is  thus  acyclic.  Pushes  obviously 
preserve  acyclicity.  Lemma  5.5  implies  that  relabelings  also  preserve  acyclicity.  | 

Next  we  derive  a  crucial  lemma. 
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Lemma  8.7  Let  f  be  a  pseudoflow  and  f  a  circulation.  For  any  vertex  t;  with  e/(v)  >  0, 
there  is  a  vertex  w  with  e/(u?)  <  0  and  a  sequence  of  distinct  vertices  v  =  t>o,  »i, . . . ,  t>/_i ,  v/  = 
w  such  that  tJi+i)  €  Ef  and  (v,+i, »,)  £  Efi  for  0  <  i  <  l. 


Proof:  Let  v  be  a  vertex  with  e/(v)  >  0.  Define  G+  =  (V,E+),  where  E+  =  {(x,y)\f'(x,y)  > 
/(x,y)},  and  define  G-  =  {V,E-),  where  E _  =  {( x,y)\f(x,y )  >  /'(x,y)}.  Then  E+  C  Ef, 
since  (x,y)  €  E+  implies  /(x,  y)  <  f(x,  y)  <  u(x,y).  Similarly  E-  C  Ef.  Furthermore 
(x,y)  €  E+  if  and  only  if  (y,  x)  6  E-  by  antisymmetry.  Thus  to  prove  the  lemma  it  suffices 
to  show  the  existence  in  G+  of  a  simple  path  v  =  vq,v\  with  e/(t;/)  <  0. 

Let  S  be  the  set  of  all  vertices  reachable  from  t>  in  G  and  let  ^  =  V  —  5.  (Set  3?  may  be 
empty.)  Fbr  every  vertex  pair  (x,y)  6  5x5,  /(x,y)  >  /'(x,y),  for  otherwise  y  6  S.  We 
have 

0  =  E(XlV)€(5xS)n£  /'(*>  y)  since  f  is  a  circulation 

<  E(x,y)g(sx5)n£;  /(*»  V )  holds  term-by-term 

=  E(XtV)6(sx5)n£:  /(*>  y )  +  £(X,v)€(SxS)nE  /(*,  V)  hY  antisymmetry 

=  E(x,»)e(SxV)nE  /(*>  y)  by  definition  of  3 

e/(x)  by  antisymmetry. 

But  v  £  S.  Since  e/(v)  >  0,  some  vertex  weS  must  have  e/(w)  <0.  | 

Using  Lemma  5.7  we  can  bound  the  amount  by  which  a  vertex  price  can  increase  during 
an  invocation  of  refine.  The  next  lemma  is  similar  to  the  lemma  of  Bland  and  Jensen  [7] 
that  bounds  the  number  of  maximum  flow  computations  in  a  scaling  step  of  their  algorithm. 

Lemma  5.8  The  price  of  any  vertex  v  increases  by  at  most  3 ne  during  an  execution  of 
refine. 

Proof :  Let  fa  and  pit  be  the  circulation  and  price  functions  on  entry  to  refine.  Suppose 
a  relabeling  causes  the  price  of  a  vertex  v  to  increase.  Let  /  be  the  pseudoflow  and  p  the 
price  function  just  after  the  relabeling.  Then  e/(u)  >  0.  Let  v  =  vo,t?i with 
e/(w)  <  0  be  the  vertex  sequence  satisfying  Lemma  5.7  for  /  and  f  =  fa- 
The  e-optimality  of  /  implies 


/-i  i-i 

l(  ^  l^Cp(v,-,v,+1)  =  p(w>)  -  p(t-)  +  ^c(i-„?v+1) 
«=0  isO 
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(8) 


The  2e-optimality  of  fn  implies 


1-1  l-i 

~  2*f  <  HcP8«(»«+i»t,0  =  P2i{v)  -  P2t(^)  +  £c(v,+1,v,)  (9) 

i=0  i=0 

But  5Zl=o  c(w«»  w*'+i)  =  -  XZi=o  c(»i+i» »«)  by  cost  antisymmetry.  Furthermore  p(w)  = 
P2((w)  since  a  vertex  with  negative  excess  has  the  same  price  as  long  as  its  excess  remains 
negative.  Add’"«r  inequalities  (8)  and  (9)  and  rearranging  terms  thus  gives 

p(v)  <  P2t(v)  +  3 U  <  j>2t(v)  +  3 ne. 

I 

Now  we  count  relabelings. 

Lemma  5.9  The  number  of  relabelings  during  an  execution  of  refine  is  at  most  3 n(n  —  1). 

Proof:  Immediate  from  Lemmas  5.3  and  5.8.  (At  least  one  vertex  is  never  relabeled;  namely, 
the  last  remaining  one  with  negative  excess.)  | 

To  count  pushes,  we  amortize  the  saturating  pulses  over  the  relabelings  and  the  nonsat¬ 
urating  pushes  over  the  relabelings  and  the  saturating  pushes. 

Lemma  5.10  The  number  of  saturating  pushes  during  one  execution  of  refine  is  at  most 
3  nm. 

Proof :  For  an  arc  (v,  to),  consider  the  saturating  pushes  along  this  arc.  Before  the  first  such 
push  can  occur,  vertex  v  must  be  relabeled.  After  such  a  push  occurs,  v  must  be  relabeled 
before  another  such  push  can  occur.  But  v  is  relabeled  at  most  3 n  times.  Summing  over 
all  arcs  gives  the  desired  bound.  | 

The  bound  in  the  next  lemma  and  its  proof  are  due  to  Ronald  Rivest.  (Our  original 
bound  was  0(n4).) 

Lemma  5.11  The  number  of  nonsaturating  pushes  during  one  execution  of  refine  is  at 
most  3 n2(m  +  n). 
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Proof:  For  each  vertex  u,  let  $(v)  be  the  number  of  vertices  reachable  from  v  in  the  current 
admissible  graph  Ga-  Let  $  =  0  if  there  are  no  active  vertices,  $  =  £{$(t>)|t>  is  active} 
otherwise.  Throughout  the  running  of  refine,  4>  >  0.  Initially  $  <  n,  since  Ga  has  no  arcs. 

Consider  the  effect  on  $  of  update  operations.  A  nonsaturating  push  decreases  $  by  at 
least  one,  since  by  Corollary  5.6  Ga  is  always  acyclic.  A  saturating  push  can  increase  $ 
by  at  most  n,  since  at  most  one  inactive  vertex  becomes  active.  If  a  vertex  v  is  relabeled, 
4  also  can  increase  by  at  most  n ,  since  by  Lemma  5.5  $(u>)  for  w  ^  v  can  only  decrease. 
The  total  number  of  nonsaturating  pushes  is  thus  bounded  by  the  initial  value  of  $  plus  the 
total  increase  in  ♦  throughout  the  algorithm,  i.e.  by  n  -f  3n2(n  -  1)  +  3n2m  <  3n2(m  +  n). 

I 

6  Efficient  Sequential  Implementation 

The  running  time  of  refine  depends  upon  the  order  in  which  basic  operations  are  applied 
and  on  the  details  of  the  implementation,  but  any  reasonable  implementation  will  run  in 
polynomial  time.  As  a  first  step  toward  obtaining  an  efficient  sequential  implementation,  we 
develop  an  implementation  that  runs  in  0(n2m)  time.  Then  we  improve  the  implementation 
to  obtain  an  0(n3)  time  bound. 

6.1  A  Generic  Implementation 

To  describe  a  straightforward  implementation  of  the  generic  algorithm,  we  need  some  data 
structures  to  represent  the  network  and  the  pseudoflow.  We  associate  a  positive  direction 
with  each  edge  {t>,u>},  and  we  store  with  each  such  edge  the  four  values  u(v,w),  u(w,v), 
f(v,w),  and  c(w,  to).  Each  vertex  v  has  a  list  of  the  incident  edges  {v,  to},  in  fixed  but 
arbitrary  order.  (Thus  an  edge  (v,  to}  appears  in  exactly  two  edge  lists,  the  one  for  v  and 
the  one  for  to.)  For  each  vertex  v  we  maintain  the  excess  ej(v)  and  a  current  edge  {u,to}; 
the  corresponding  current  arc  (v,to)  is  the  current  candidate  for  a  push  out  of  v.  Initially 
the  current  edge  is  the  first  edge  on  the  edge  list  of  v. 

The  generic  implementation  of  the  while  loop  in  refine  consists  of  repeating  the  following 
steps  until  there  are  no  active  vertices:  select  an  active  vertex  v ;  apply  to  v  the  push/relabel 
operation.  The  push/relabel  operation  is  described  in  Figure  7.  Such  an  operation  applies 
to  an  active  vertex  v  with  current  edge  {v,  tv).  It  pushes  flow  through  the  arc  ( v ,  w)  if  this 


push/rtlabel(v). 

Applicability:  v  is  active. 

Action:  let  {v,  u>}  be  the  current  edge  of  t>. 

if  uj(v,w)  >  0  and  cp(v,w)  <  0  then  push(v,w) 
else 

if  {v,  u>}is  not  the  last  edge  on  the  edge  list  of  v  then 
replace  {t>,  uj}  as  the  current  edge  of  v 
by  the  next  edge  on  the  list 

else  begin 

make  the  first  edge  on  the  edge  list  of  v  the  current  edge; 
relabel(v); 

end. 

Figure  7:  The  push/relabel  operation. 

arc  is  eligible.  If  not,  it  replaces  the  current  edge  of  v  by  the  next  edge  on  the  edge  list  of 
t;  unless  the  current  edge  is  the  last  edge  on  the  list,  in  which  case  it  makes  the  first  edge 
on  the  list  the  current  one  and  relabels  v. 

Lemma  0.1  The  push/relabel  procedure  uses  the  price  update  operation  only  when  this 
operation  is  applicable. 

Proof:  Suppose  a  vertex  v  is  relabeled.  Just  before  the  relabeling,  for  each  arc  (v,w), 
either  cp(v,w)  >  0  or  u/(v,w)  =  0,  because  p(v)  has  not  changed  since  {v,  u>}  was  most 
recently  the  current  edge  of  v,  p( w)  never  decreases,  and  uj(v,  w)  cannot  increase  unless 
cp(w,  v)  <  0,  i.e.  cp(v,  w)  >  0.  | 

The  generic  implementation  needs  one  additional  data  structure,  a  set  S  containing  all 
active  vertices.  Initially  S  contains  all  vertices  whose  excess  becomes  positive  during  the 
initialization  step  of  refine.  Updating  S  takes  only  0(1)  time  per  push/relabel  operation. 
(Such  an  operation  requires  possibly  deleting  one  vertex  from  S  and  adding  one  vertex  to 
5.) 

Theorem  0.2  The  generic  implementation  of  refine  runs  in  0(n2m)  time,  giving  an 
0(nJm  min{log(nC),  m  log  ra})  time  bound  for  computing  a  minimum-cost  circulation. 

Proof:  The  total  number  of  passes  through  each  edge  list  is  O(n)  by  Lemmas  5.3  and  5.8. 
The  desired  bound  follows  from  Lemmas  5.9,  5.10,  and  5.11.  I 


Discharge. 

Applicability:  v  is  active. 

Action:  apply  pusk/relshel  operations  to  v  until  v  is  relabeled 

or  becomes  inactive. 

Figure  8:  The  discharge  operation. 

6.2  The  Wave  Implementation 

To  obtain  a  faster  version  of  refine ,  we  begin  by  somewhat  restricting  the  freedom  of  choice 
in  push/relabel  operations.  The  restricted  algorithm  consists  of  repeatedly  selecting  an 
active  vertex  and  applying  to  it  the  discharge  operation,  described  in  Figure  8.  A  discharge 
operation  applies  push/relabel  operations  to  an  active  vertex  until  it  is  relabeled  or  becomes 
inactive,  i.e.  its  excess  drops  to  zero. 

Remark:  We  can  actually  allow  a  certain  amount  of  flexibility  in  the  discharge  operation. 
Specifically,  after  a  vertex  v  is  relabeled,  additional  push/relabel  operations  can  optionally 
be  applied  to  v,  as  long  as  v  remains  active.  The  bounds  we  shall  derive,  although  stated 
for  the  original  version  of  discharge,  are  valid  for  this  more  general  version  as  well. 

There  remains  the  issue  of  the  order  in  which  to  consider  active  vertices.  The  first-in 
first-out  (FIFO)  algorithm  consists  of  maintaining  the  set  of  active  vertices  as  a  queue, 
repeatedly  discharging  the  front  vertex  on  the  queue  and  adding  newly  active  vertices  to 
the  rear  of  the  queue.  We  might  expect  the  first-in,  first-out  algorithm  to  be  very  efficient, 
since  the  analogous  algorithm  for  the  maximum  flow  problem  requires  0(n3)  passes  over  the 
queue  and  runs  in  0(n3)  time.  Unfortunately,  the  first-in,  first-out  implementation  of  refine 
makes  0(n3)  passes  over  the  queue  in  the  worst  case,  and  we  are  unable  to  establish  a  bound 
on  the  total  running  time  better  than  the  0(n2m )  bound  of  the  generic  implementation. 
The  following  lemma  establishes  an  0(n3)  bound  on  queue  passes;  the  appendix  describes 
a  class  of  examples  that  take  fl(n3)  passes. 

We  define  passes  over  the  queue  as  follows.  Pass  one  consists  of  the  discharge  operations 
applied  to  vertices  added  to  the  queue  during  the  initialization  of  the  pseudoflow.  For  i  >  2, 
pass  i  consists  of  the  discharge  operations  applied  to  vertices  added  to  the  queue  during 
pass  i  -  1. 

Lemma  6.3  During  the  execution  of  the  first-in,  first-out  implementation  of  refine ,  there 
are  0(n3)  passes  over  the  queue. 
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Proof:  As  in  the  proof  of  Lemma  5.11,  let  $(t>),  for  each  vertex  t»,  be  the  number  of 
vertices  reachable  from  v  in  the  current  admissible  graph  Ga-  Let  $  =  0  if  there  are  no 
active  vertices,  ♦  =  max{$(u)|v  is  active}  otherwise.  Consider  the  effect  on  $  of  a  pass 
over  the  queue.  If  no  relabelings  take  place  during  the  pass,  $  drops  by  at  least  one,  since 
the  excess  pushed  from  any  vertex  is  pushed  to  vertices  with  smaller  values  of  $.  If  one 
or  more  relabelings  take  place  during  the  pass,  $  can  increase  by  at  most  n,  since  it  is 
always  the  case  that  0  <  $  <  n.  The  total  number  of  passes  is  thus  at  most  the  number 
of  relabelings  plus  the  number  of  passes  in  which  #  decreases.  The  latter  is  at  most  n  (the 
maximum  possible  initial  value  of  $)  plus  3n2(n  —  1)  (i.e.  n  times  the  maximum  number  of 
relabelings).  This  gives  a  bound  of  at  most  3n(n  —  1)  +  n  +  3n2(n  —  1)  <  3n3  on  the  total 
number  of  passes.  | 

Although  the  first-in,  first-out  approach  fails  to  improve  the  efficiency  of  refine,  a  similar 
approach,  proposed  by  Charles  Leiserson  (private  communication)  and  independently  by 
Bertsekas  [4]  (without  some  implementation  details  and  the  use  of  seeding),  does  produce 
a  faster  algorithm.  We  call  this  method  the  wave  method  since  it  resembles  the  wave 
algorithm  for  computing  maximum  network  flows  [53].  The  wave  method  is  a  generalization 
of  a  version  of  our  maximum  flow  algorithm  that  pushes  the  flow  from  an  active  vertex  with 
the  biggest  distance  label  [29];  in  the  wave  method,  the  vertices  are  processed  in  topological 
order  with  respect  to  the  admissible  graph. 

The  wave  method,  described  in  Figure  9,  maintains  a  list  L  of  all  the  vertices  of  G,  in 
topological  order  with  respect  to  the  current  admissible  graph  Ga ,  i.e.  if  (r,  w)  is  an  arc  of 
Ga,  v  appears  before  w  in  L.  The  method  also  maintains  a  current  vertex  in  L,  which  is  the 
next  candidate  for  discharging.  Initially  L  contains  the  vertices  of  G  in  any  order,  and  the 
current  vertex  is  the  first  one  on  L.  The  method  repeatedly  scans  L,  applying  the  discharge 
operation  to  each  active  vertex  encountered.  When  a  discharge  causes  a  relabeling  to  take 
place,  the  affected  vertex  v  is  moved  to  the  front  of  L,  but  the  next  vertex  examined  is  the 
one  after  the  old  position  of  v.  The  method  terminates  when  there  are  no  active  vertices. 

The  key  to  analyzing  the  wave  method  is  to  observe  that,  because  of  the  topological 
ordering  of  L,  the  method  terminates  immediately  after  a  pass  through  L  during  which  no 
relabelings  occur. 

Lemma  0.4  The  wave  method  terminates  after  0(n2)  passes  through  L. 

Proof:  Lemma  5.5  implies  that  the  wave  method  maintains  the  invariant  that  L  is  topolog- 
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procedure  mm; 

let  £  be  a  list  of  all  vertices; 
let  v  be  the  first  vertex  on  L; 
while  3  an  active  vertex  do  begin 
if  v  is  active  then  begin 
ii»ekar§e(v); 

if  the  discharge  has  relabeled  v  then 
move  v  to  the  front  of  L ; 

end; 

replace  v  by  the  vertex  after  the  old  position  of  v  on  L, 
or  by  the  first  vertex  on  L  if  u  was  previously  last  on  L; 

end; 

end. 


Figure  9:  The  wave  method. 

ically  ordered.  If  a  pass  over  L  occurs  during  which  no  relabelings  occur,  then  the  ordering 
of  L  does  not  change  during  the  pass  and  every  vertex  is  able  to  reduce  its  excess  to  zero. 
Thus  no  active  vertices  exist  after  the  pass.  | 


Theorem  6.5  The  wave  implementation  of  refine  rung  in  0(n3)  time,  giving  an 
0(n3min{log(nC),mlogn})  hound  for  finding  a  minimum-cost  circulation. 

Proof:  Lemma  6.4  implies  that  there  are  0(n3)  nonsaturating  pushes  (one  per  vertex  per 
pass)  during  an  execution  of  refine.  By  Lemma  5.10  there  are  O(nm)  saturating  pushes. 
The  time  taken  by  refine  is  0(n3)  plus  0(1)  per  push,  for  a  total  of  0(n3).  | 

The  following  order  of  processing  vertices  also  leads  to  an  0(n3)  running  time  for  refine : 
apply  a  discharge  operation  to  the  first  active  vertex  on  L,  move  this  vertex  to  the  front  of 
L  if  it  has  been  relabeled,  and  repeat  until  there  are  no  active  vertices.  The  only  difference 
from  the  version  of  the  wave  subroutine  described  on  Figure  9  is  that  when  a  vertex  v  is 
relabeled  and  moved  to  the  beginning  of  L,  the  processing  continues  from  the  beginning  of 
L  (i.e.  from  the  current  position  of  r),  rather  than  from  the  old  position  of  r.  We  call  this 
the  first-active  version  of  refine. 

7  Use  of  Dynamic  Trees 


Through  the  use  of  sophisticated  data  structures,  we  can  further  reduce  the  running  time 
of  refine  on  sparse  graphs.  We  shall  describe  how  to  implement  the  first-active  version  of 
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find-root(v): 
find-*ize(v ): 
find-valvc(v): 
find-min(v): 

change-valuc(v,  x): 

link(v,  ui): 

c*t(v): 


Find  and  return  the  root  of  the  tree  containing  vertex  v. 

Find  and  return  the  number  of  vertices  in  the  tree  containing  vertex  v. 
Compute  and  return  g(v). 

Find  and  return  the  ancestor  w  of  v  of  minimum  value  g(w).  In  case  of  a 
tie,  choose  the  vertex  w  closest  to  the  root. 

Add  real  number  x  to  g(w)  for  each  ancestor  to  of  t>.  (We  adopt  the 
convention  that  oo  +  (—00)  =  0.) 

Combine  the  trees  containing  vertices  v  and  to  by  making  to  the  parent 
of  t>.  This  operation  does  nothing  if  v  and  to  are  in  the  same  tree  or  if  v 
is  not  a  tree  root. 

Break  the  tree  containing  v  into  two  trees  by  deleting  the  arc  from  v  to 
its  parent.  This  operation  does  nothing  if  v  is  a  tree  root. 


Figure  10:  Dynamic  tree  operations. 

refine ,  described  at  the  end  of  Section  6.2,  so  that  it  runs  in  0(nm log( n3/m))  time. 

There  are  two  bottlenecks  in  the  implementation  of  this  algorithm.  First,  we  need  a  way 
to  maintain  L  so  that  the  first  active  vertex  is  easy  to  find.  For  this  purpose  we  use  a  data 
structure  based  on  finger  search  trees  [39,54].  Second,  we  need  a  way  to  maintain  the  set 
of  current  edges  so  that  the  time  per  nonsaturating  push  is  o(l).  For  this  purpose  we  use 
the  dynamic  tree  data  structure  of  Sleator  and  Tarjan  [48,49].  The  0(nm  log( na/m))  time 
bound  of  the  resulting  version  of  refine  matches  that  of  the  dynamic  tree  implementation 
of  our  maximum  flow  algorithm  [29],  and  its  derivation  is  similar,  except  for  the  additional 
complication  of  maintaining  L. 

7.1  The  Tree  Method 

We  shall  postpone  to  Section  7.3  the  issue  of  maintaining  L  and  focus  on  the  use  of  dynamic 
trees.  The  dynamic  tree  data  structure  allows  the  maintenance  of  a  collection  of  vertex- 
disjoint  rooted  trees,  each  vertex  v  of  which  has  an  associated  value  g(v),  which  is  either 
a  real  number  or  00.  The  structure  supports  the  seven  tree  operations  described  in  Figure 
10.  The  total  time  for  l  tree  operations,  starting  with  a  collection  of  single-vertex  trees,  is 
0(/logA:),  where  k  is  an  upper  bound  on  the  maximum  number  of  vertices  in  a  tree.  (The 
implementation  of  dynamic  trees  discussed  in  [48,49]  does  not  support  find-size  operations, 
but  it  is  easily  modified  to  do  so,  as  discussed  in  [29].) 

In  our  application,  if  parenf(v)  is  the  parent  of  a  vertex  v  in  a  dynamic  tree,  then 
{v,  paren<(t;)}  is  the  current  arc  of  v  and  (v,poren<(r))  is  admissible,  i.e.  u/(r,poren^v))  >  0 
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Send(v). 

Applicability:  *  is  active. 

Actio*:  whBe  find-root(v)  ^  v  and  e(v)  >  0  do  begin 

■end  6  «—  min{e(v),^»d-v«i«e(^nd-m»»»(t;))}  units  of  flow 

along  the  tree  path  from  v  by  performing  change-vaiue(v,  -6); 
while  fi»d-v*l%e(find-mtn(v))  =  0  do  begin 
i «—  fi*d-min(v); 

perform  c*i(t)  followed  by  ckange-valne(i,  oo); 

end; 

end. 


Figure  11:  The  Send  operation. 


and  e^t^poren^o))  <  0.  Not  all  admissible  current  arcs  are  tree  arcs,  however.  The  value 
g(v)  of  a  vertex  v  in  its  dynamic  tree  is  « i/( v,  parent(v))  if  v  has  a  parent  and  oo  if  v  is  a  tree 
root.  Initially,  each  vertex  is  in  a  one- vertex  dynamic  tree  and  has  value  oo.  We  limit  the 
maximum  tree  size  to  k,  where  Jfc  ha  parameter  to  be  chosen  later,  satisfying  2  <  k  <  n. 

The  dynamic  tree  implementation  of  refine ,  which  we  shall  call  the  tree  method ',  uses 
tree  operations  to  send  flow  along  an  entire  tree  path  at  a  time,  thereby  avoiding  explicitly 
dealing  with  nonsatnrating  pushes.  The  heart  of  the  algorithm  is  the  procedure  send(v) 
defined  in  Fignre  11,  which  pashes  excess  from  a  nonroot  vertex  v  to  the  root  of  its  tree, 
cuts  arcs  saturated  by  the  pnsh,  and  repeats  these  steps  until  e(»)  =  0  or  v  is  a  tree  root. 

Instead  of  the  push/relabel  operation,  the  tree  method  uses  the  tree  push/relabel  oper¬ 
ation,  defined  in  Figure  12.  Such  an  operation  applies  to  an  active  vertex  v  that  is  the 
root  of  a  dynamic  tree.  There  are  two  main  cases.  The  first  case  occurs  if  the  current  edge 
{o, «?}  of  t;  is  such  that  (u,  w)  is  admissible.  If  the  trees  containing  v  and  w  together  have 
at  most  k  vertices,  these  trees  are  linked  by  making  w  the  parent  of  v,  and  then  a  send(w) 
is  done.  If  these  trees  together  contain  more  than  k  vertices,  a  push(v ,  w)  is  done  followed 
by  a  send(w).  The  second  case  occurs  if  (v,  w)  is  not  admissible.  In  this  case  the  current 
edge  of  v  is  updated  and  v  is  relabeled  if  the  previous  current  edge  was  the  last  on  the  edge 
list  of  v.  If  v  is  relabeled,  all  tree  arcs  entering  v  are  cut,  thereby  preserving  the  invariant 
that  all  tree  arcs  are  admissible. 

The  tree  method  uses  the  modified  form  of  discharge  shown  in  Figure  13,  called  tree- 
discharge,  which  is  the  same  as  discharge  except  that  push-relabel  operations  are  replaced 
by  tree-push/relabel  operations,  which  are  applied  to  an  active  vertex  v  until  it  becomes 
inactive. 
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trec-pnah/rclabeHy) . 

Applicability:  v  is  an  active  tree  root. 


.Acfion: 


let  {t>,  to}  be  the  current  edge  of  t>; 

(1)  if  (v,  to)  is  admissible  then 

(la)  if  find-size(v)  +  find-aize(w)  <  k  then  begin 

[make  w  the  parent  of  t>] 
change-val*e(v,  — oo); 
change- val%e(v,  rj(v,  to)); 
link{v,  to); 

[push  excess  from  v  to  to] 
send(t>); 
end 

(lb)  else  \find-aizt(v)  +  find-aize(w)  >  ib]  begin 

puah(v,  to); 
aend{  to); 

end 

(2)  else  [(v,  to)  is  not  admissible] 

(2a)  if  [v,  to}  is  not  the  last  edge  on  the  edge  list  of  v  then 

replace  {v,  to}  as  the  current  edge  by  the  next  edge  on  the  list 
(2b)  else  [{v,  to}  is  the  last  edge  on  the  edge  list  of  v]  begin 
make  the  first  edge  on  the  list  the  current  one; 
for  every  child  u  of  v  do  begin 

c*K  «); 

change- value(u,  oo); 
end; 
relabe^v); 
end. 


Figure  12:  The  tree-push/relabel  operation. 

The  tree  method  itself  is  defined  in  Figure  14.  It  differs  from  the  wave  method  in  two 
respects:  active  vertices  are  processed  in  a  different  order,  and  discharge  operations  are 
replaced  by  tree-discharge  operations. 

It  is  important  to  realize  that  the  tree  method  stores  values  of  the  pseudoflow  /  in  two 
different  ways.  If  {v,  w}  is  an  edge  that  is  not  a  tree  edge,  then  /(v,  w)  is  stored  explicitly, 
with  {t>, «?}.  If  {»,  in}  is  a  tree  edge,  with  w  the  parent  of  v,  then  u/(v,  w)  is  stored  implicitly 
in  the  dynamic  tree  data  structure  as  g(v).  Whenever  a  tree  edge  {v,tv}  is  cut,  <7(1;)  must 


free-  discharge(y) . 

Applicability:  v  is  an  active  tree  root. 

Acfion;  apply  tree-push/relabel  operations  to  v  until  v  becomes  inactive.; 


Figure  13:  The  tree-discharge  operation. 


fc 


procedure  free; 

let  L  be  a  Uat  of  all  vertices; 
while  3  an  active  vertex  do  begin 

let  v  be  the  first  active  vertex  on  L; 
tree-4uekarge(v); 

if  the  duck* r§t  has  relabeled  v  then 
move  v  to  the  front  of  L ; 

end; 

end. 

Figure  14:  The  tree  method. 

be  computed  and  f(v,  w)  set  to  its  correct  current  value.  In  addition,  when  the  algorithm 
terminates,  flow  values  must  be  computed  for  all  edges  remaining  in  dynamic  trees. 

Three  observations  imply  the  correctness  of  the  tree  method.  First,  the  algorithm  main¬ 
tains  the  invariant  that  every  tree  arc  is  admissible.  Second,  the  acyclicity  of  the  admissible 
graph  implies  that  the  algorithm  never  attempts  to  link  two  vertices  in  the  same  dynamic 
tree.  Third,  a  vertex  v  that  is  not  a  tree  root  can  be  active  only  in  the  middle  of  case  (1) 
of  a  tree-push /relabel  operation.  To  see  this,  note  that  only  in  this  case  does  the  algorithm 
create  an  active  nonroot  vertex,  and  this  event  is  followed  by  a  send  operation,  after  which 
all  the  excess  is  on  one  or  more  roots. 

Remark:  If  we  let  the  maximum  tree  size  Jb  equal  n  and  generalize  the  tree  method  to  apply 
discharge  operations  to  active  vertices  in  any  order,  we  obtain  an  0(nm logo)  implementa¬ 
tion  of  refine.  See  [26,29]  for  details  of  the  analysis.  This  method  does  not  need  an  elaborate 
data  structure  to  select  active  vertices  for  processing,  but  its  time  bound  is  asymptotically 
worse  than  that  of  the  tree  method  on  non-sparse  graphs. 

Remark:  The  version  of  the  tree  method  that  processes  active  vertices  in  the  same  order  as 
the  wave  method,  i.e.  by  making  passes  through  the  list  L,  also  runs  in  0(nm  log(  n2/m)) 
time,  but  it  is  slightly  more  complicated  to  implement. 

7.2  Analysis  of  the  Tree  Method 

In  order  to  analyze  the  tree  method,  we  introduce  a  bit  of  terminology.  We  say  a  vertex  is 
discharged  when  a  discharge  operation  is  applied  to  it.  We  say  a  vertex  is  activated  when  it 
becomes  active  while  it  is  a  root  vertex  or  it  becomes  a  tree  root  while  it  is  active.  A  vertex 
can  only  be  activated  during  the  initialization  or  in  cases  (la)  and  (lb)  of  tree-push/relabel. 


In  case  (la),  each  push  of  flow  from  v  during  the  operation  send  (v)  can  activate  a  vertex. 
In  case  (lb),  the  operation  push  («,«?)  can  activate  w  if  w  is  a  root.  If  w  is  not  a  root,  each 
push  of  flow  from  w  during  the  operation  send  (tt>)  cam  activate  a  vertex;  and,  in  addition, 
the  last  cut  of  the  send,  if  it  cuts  the  current  arc  leaving  w,  can  activate  w. 

Lemma  7.1  Between  relabelings,  the  order  of  L  does  not  change,  and  each  vertex  is  dis¬ 
charged  at  most  once. 

Proof:  The  only  time  the  order  of  L  is  changed  is  when  a  vertex  is  relabeled.  When  a 
vertex  is  discharged,  its  excess  is  moved  to  vertices  after  it  on  L.  The  lemma  follows.  | 

Lemma  7.2  The  maximum  size  of  a  dynamic  tree  is  k.  The  number  of  dynamic  tree 
operations  is  0(nm)  plus  0(1)  per  vertex  activation. 

Proof :  The  test  in  case  (la)  of  tree-push/relabel  guarantees  that  the  size  of  a  dynamic  tree 
can  never  exceed  k.  Each  tree-push/relabel  operation  does  0(1)  tree  operations  plus  0(1) 
tree  operations  per  cut  operation  (in  invocations  of  send  and  in  case  (2b)).  The  total  number 
of  cut  operations  is  at  most  the  number  of  link  operations,  which  is  0{nm)  by  Lemmas  5.3 
and  5.8.  (An  operation  link{ v,  w)  can  only  occur  once  between  relabelings  of  v.)  Thus  the 
number  of  tree  operations  is  0(nm)  plus  0(1)  per  tree-push/relabel.  During  an  operation 
discharge(v),  every  tree-push/relabel  except  the  last  does  either  a  link,  a  saturating  push,  or 
a  relabel ,  or  changes  the  current  edge  of  v.  Thus  there  are  0(nm)  tree  operations  plus  0(1) 
per  discharge.  The  number  of  discharge  operations  equals  the  number  of  vertex  activations. 

I 

The  next  lemma,  analogous  to  Lemma  5.2  of  [29],  is  the  heart  of  the  analysis. 

Lemma  7.3  The  number  of  vertex  activations  is  0(nm  +  n3/k). 

Proof:  There  are  at  most  n  -  1  activations  during  the  initialization.  All  others  occur  in 
cases  (la)  and  (lb)  of  tree-push/relabel.  In  either  case  the  number  of  activations  is  at  most 
one  more  than  the  number  of  cuts  done  during  the  invocation  of  send  in  the  case.  Each 
occurrence  of  (la)  results  in  a  link.  It  follows  that  the  number  of  activations  is  0(nm )  plus 
one  per  occurrence  of  (lb)  that  results  in  a  nonsaturating  push,  an  activation,  and  no  cuts. 
Let  us  call  such  an  occurrence  of  (lb)  a  critical  occurrence. 


For  any  vertex  tt,  let  T„  be  the  dynamic  tree  containing  u  and  let  |rtt|  be  the  number 
of  vertices  it  contains.  We  say  T„  is  smtUl  if  |TU|  <  k/2  and  large  otherwise.  At  any  time 

there  are  at  most  2»/Jfc  large  trees. 

We  shall  charge  each  critical  occurrence  of  (lb)  to  either  a  link,  a  cut,  or  a  relabeling. 
We  charge  0(1)  occurrences  to  each  Bide  or  cut  and  0(n/k)  to  each  relabeling.  This  gives  a 
bound  of  0(«m  +  n*/k)  on  the  total  number  of  critical  occurrences,  and  hence  on  the  total 
number  of  activations.  For  ease  in  stating  the  argument,  we  shall  assume  that  a  “phantom” 
relabeling  occurs  at  initialization. 

Consider  a  critical  occurrence  (lb),  in  which  a  nonsaturating  push  from  a  vertex  v  to  a 
vertex  w  occurs,  followed  by  a  send  (tv).  Since  this  occurrence  is  critical,  T,„  does  not  change 
as  a  result  of  the  send  and  the  root  of  Tw  is  the  only  vertex  activated  by  the  occurrence.  The 
condition  on  entry  to  case  (lb)  guarantees  that  Tv  or  Tw  is  large,  giving  us  two  possibilities 
to  consider. 

Suppose  Tv  is  large.  Vertex  v  is  the  root  of  Tv.  The  critical  occurrence  of  (lb)  removes 
all  the  excess  from  v,  which  means  that  such  an  occurrence  can  apply  to  a  vertex  v  only 
once  between  relabelings.  If  Tv  has  changed  since  the  last  preceding  relabeling,  charge  the 
critical  occurrence  to  the  link  or  cut  that  changed  Tv  most  recently  before  the  occurrence. 
The  total  number  of  such  charges  is  at  most  one  per  link  and  two  per  cut.  (A  link  forms  one 
new  tree;  a  cut,  two.)  If  Tv  has  not  changed  since  the  last  preceding  relabeling,  charge  the 
occurrence  to  this  relabeling.  Since  at  any  time,  including  just  after  a  relabeling,  there  are 
at  most  2 n/k  large  trees,  the  total  number  of  such  charges  is  at  most  2 n/k  per  relabeling. 

Suppose  on  the  other  hand  that  Tw  is  large.  The  critical  occurrence  activates  the  root 
of  Tw,  say  r.  A  given  vertex  r  can  be  activated  at  most  once  between  relabelings.  If  Tw  has 
changed  since  the  last  preceding  relabeling,  charge  the  occurrence  to  the  link  or  cut  that 
most  recently  changed  Tw  before  the  occurrence.  The  number  of  such  charges  is  at  most 
one  per  link  and  two  per  cut.  If  Tw  has  not  changed,  charge  the  occurrence  to  the  last 
preceding  relabeling.  The  number  of  such  charges  is  at  most  2n/k  per  relabeling.  | 


Lemma  7.4  The  running  time  of  the  tree  method,  not  counting  time  spent  maintaining  L 
and  finding  vertices  to  discharge,  is  0((nm  +  n3 /k)  logit). 


maintenance  of  current  edges.  | 


7.3  Representation  of  the  Vertex  List 

The  task  remaining  is  to  select  a  representation  for  the  vertex  list  L  and  analyze  the  time 
needed  to  maintain  it.  To  facilitate  the  finding  of  active  vertices,  we  split  L  into  sublists  by 
dividing  L  just  before  each  active  vertex.  Each  sublet  consists  of  a  first  vertex,  called  the 
head  of  the  sublist,  and  zero  or  more  inactive  vertices.  The  head  of  the  first  sublist  can  be 
active  or  inactive;  the  second  and  subsequent  sublists  all  have  active  heads.  We  represent 
X  by  a  doubly  linked  list  of  these  sublists. 

Consider  the  operations  on  L  needed  in  the  tree  method.  Finding  the  first  active  vertex 
on  L  requires  examination  of  the  heads  of  the  first  two  sublists  and  selection  of  the  first  of 
these  that  is  active.  After  a  vertex  v  is  relabeled,  it  must  be  moved  to  the  front  of  L.  We 
can  do  this  as  follows.  When  v  is  relabeled,  it  is  active.  If  v  is  not  already  on  the  front  of  L , 
we  delete  it  from  its  current  sublist  (of  which  it  is  the  head),  concatenate  what  remains  of 
this  sublist  with  the  preceding  sublist,  and  either  insert  v  into  the  front  of  the  first  sublist 
if  its  head  is  inactive,  or  make  v  into  a  one-element  sublist  on  the  front  of  L,  if  the  head  of 
the  old  first  sublist  is  active. 

List  L  must  also  be  updated  when  vertices  are  activated  and  discharged.  (Recall  from 
Section  7.2  that  a  vertex  is  activated  when  it  is  a  root  and  becomes  active  or  when  it  is 
active  and  becomes  a  root.)  We  can  do  this  as  follows.  When  a  vertex  t>  is  activated,  we  split 
the  sublist  containing  v  into  two  sublists,  with  v  the  head  of  the  second  one.  Immediately 
after  a  vertex  is  discharged,  we  concatenate  the  sublist  of  which  it  is  the  head  with  the 
preceding  sublist,  if  any. 

In  summary,  we  need  the  following  four  operations  on  sublists: 

1.  Initialize  a  sublist  containing  a  given  sequence  of  vertices. 

2.  Access  the  head  of  a  given  sublist. 

3.  Split  the  sublist  containing  a  given  vertex  just  before  that  vertex  and  return  the  two 
resulting  sublists. 

4.  Concatenate  two  sublists  and  return  the  resulting  sublist. 

Deleting  the  head  of  a  sublist  is  a  special  case  of  splitting;  adding  a  new  head  to  the 
front  of  a  sublist  is  a  special  case  of  concatenation. 


We  detire  a  sublist  representation  that  achieves  the  following  time  bounds  for  the  four 

operations: 

1.  0(1  log/)  to  initialize  a  sublist  containing  l  vertices. 

2.  0(1)  to  access  the  head  of  a  sublist. 

3.  0(1)  to  split  a  sublist  just  after  a  given  vertex. 

4.  0(1  +logmin{/i,/2})  to  concatenate  two  sublists  containing  l\  and  I2  vertices,  respec¬ 
tively. 

It  suffices  that  these  bounds  be  amortized ,  i.e.  that  the  total  time  for  an  arbitrary 
sequence  of  operations  starting  with  no  sublists  be  no  greater  than  the  sum  01  the  time 
bounds  of  the  operations.3  Let  us  call  a  sublist  representation  suitable  if  it  achieves  the 
desired  bounds  in  the  amortized  sense. 

Lemma  7.5  With  a  suitable  sublist  representation,  the  total  time  needed  to  maintain  L  and 
to  find  vertices  for  discharging  is  0((nm+  n3/k)logk),  where  k  is  the  maximum  dynamic 
tree  size. 

Proof:  The  time  to  initialize  the  sublists  of  which  L  is  comprised  is  0(n logo).  The  time 
to  find  the  first  active  vertex  on  L  is  0(1);  by  Lemma  7.3,  this  operation  occurs  0((nm  + 
n3/k)logk)  times.  The  number  of  splitting  operations  is  0(1)  per  vertex  activation  and 
0(1)  per  relabeling,  for  a  total  time  bound  of  0((nm  +  n3/k)  log k).  There  are  at  most  two 
concatenations  per  relabeling.  One  takes  0(1)  time;  the  other,  O(logn).  The  total  time  for 
concatenations  accompanying  relabelings  is  thus  0(n3  log  n). 

It  remains  to  bound  the  time  for  concatenations  that  occur  between  relabelings  as  a 
result  of  discharge  operations.  We  shall  show  that  the  total  concatenation  time  during  an 
interval  in  which  no  relabelings  occur  is  0((n/fc)logfc)  plus  0( logic)  per  vertex  activation, 
from  which  an  overall  0((nm  +  n3/k)logk)  bound  on  concatenation  time  follows. 

Consider  a  time  interval  I  during  which  no  relabelings  occur.  There  is  one  concatenation 
per  discharge  operation  during  /.  An  operation  discharge (u)  results  in  the  concatenation  of 
the  sublist  containing  v,  say  S{v ),  with  the  preceding  sublist.  Observe  that  during  interval 
I  successively  concatenated  sublists  S(v),  S(tt>),. . .  are  vertex-disjoint.  Call  a  sublist  S(r) 

aThe  second  author’s  survey  paper  [51]  contains  a  thorough  discussion  of  amortization. 


small  if  it  contains  at  most  k  vertices  and  large  otherwise.  The  time  for  concatenating  small 
sublists  is  O(log&)  per  concatenation,  i.e.  O(logfc)  per  vertex  activation.  Let  uj,  i>2,  •  •  •  >  t>f 
be  the  vertices  for  which  S(vj),  S(v2), . . . ,  S(vi )  are  large  when  concatenated  during  I.  Then 
/  <  n/k,  and  the  total  time  for  concatenating  these  sublists  is 

/ 

0(£lo g(l  +  |5(v,)|))  <  /log(l  +  n/l )  <  (n/k) log(l  +  k). 

i=i 

(The  worst  case  occurs  when  all  sublists  are  of  the  same  size.)  Thus  the  total  time 
for  concatenating  large  sublists,  summed  over  all  intervals  containing  no  relabelings,  is 
O((n3/k)logk).  | 

Theorem  7.0  With  a  suitable  sublist  representation,  the  tree  method  runs  in 
0(nm  log(  n2/m))  time,  giving  an  0(nmlog(n2/m)min{log(nC)),mlogn})  bound  for  find¬ 
ing  a  minimum-cost  circulation. 

Proof:  Choose  k  =  2 n2/m  and  apply  Lemmas  7.4  and  7.5.  | 

A  suitable  way  to  represent  sublists  is  to  use  finger  search  trees  [39,54].  Such  trees  give 
the  following  amortized  time  bounds  for  the  four  sublist  operations  (see  [54]): 

1.  0(/log/)  to  initialize  a  sublist  containing  l  vertices. 

2.  0(1)  to  access  the  head  of  a  sublist. 

3.  0(l4-logmin{/i,/2})  to  split  a  sublist  into  sublists  of  sizes  li  and  /2,  or  to  concatenate 
two  sublists  of  sizes  li  and  /2. 

These  are  not  quite  the  desired  bounds.  In  particular,  we  want  each  splitting  operation 
to  take  0(1)  time  rather  than  0(1  -f  logmin{/i,/2}).  Fortunately  we  can  charge  the  split¬ 
ting  time  to  concatenations,  thereby  reducing  the  amortized  time  per  split  to  0(1)  while 
increasing  the  amortized  time  per  concatenation  by  only  a  constant  factor. 

The  argument  uses  the  idea  of  a  potential  function  [51].  We  define  the  potential  of  a 
sublist  of  size  /  to  be  -c(l  +  log2/),  where  c  is  the  constant  in  the  amortized  time  bound 
for  splitting.  We  define  the  total  potential  of  a  collection  of  sublists  to  be  the  sum  of 
their  individual  potentials.  We  define  the  nominal  time  of  a  sublist  operation  to  be  its 


amortized  time  bound  plus  the  net  increase  in  total  potential  it  causes.  For  any  sequence 
of  sublist  operations,  the  sum  of  the  nominal  times  equals  the  sum  of  the  amortized  time 
bounds  plus  the  final  total  potential  minus  the  initial  total  potential.  In  our  application, 
the  initial  total  potential  is  zero  (there  are  no  sublists  initially)  and  the  final  potential  is  at 
least  —  cn  log  n.  Thus  for  any  sequence  of  sublist  operations  the  sum  of  the  amortized  time 
bounds  is  O(nlogn)  plus  the  sum  of  the  nominal  times.  The  nominal  time  to  initialize  a 
sublist  of  /  items  is  0(/logn).  (The  initial  potential  is  zero;  the  final  potential  is  negative.) 
The  nominal  time  to  access  the  head  of  a  sublist  is  0(1).  (There  is  no  change  in  potential.) 
The  nominal  time  for  splitting  is  0(1).  (When  sublists  of  sizes  li  and  /2  are  formed  by  a  split, 
the  potential  change  is  c(-l  -  log2/i  -  1  -  log2  /2  +  1  +  log(/2  +  /2))  <  -clogmin{/i,/2}.) 
The  nominal  time  for  a  concatenation  is  at  most  a  constant  factor  times  its  amortized 
time  bound.  (When  sublists  of  sizes  li  and  /2  are  concatenated,  the  potential  change  is 
c(— 1  —  log(/i  +  h)  +  1  +  log/i  +  1  +  log/2)  <  c(l  +  logmin{/i,/2}.))  Thus  we  have  the 
following  result. 

Lemma  7.7  Finger  search  trees  are  a  suitable  representation  of  sublists. 

8  Refinement  Using  Blocking  Flows 

It  is  natural  to  ask  how  much  harder  the  minimum-cost  circulation  problem  is  than  the 
maximum  flow  problem.  Based  on  the  results  of  Sections  4-7,  one  might  conjecture  that  the 
minimum-cost  circulation  problem  can  be  solved  in  0(min{log( nC),  m  log  n})  iterations  of 
a  maximum  flow  subroutine.  Although  we  are  unable  to  prove  this  conjecture,  we  show  in 
this  section  that  0(nmin{log(nC),mlogn})  iterations  of  a  blocking  flow  subroutine  suffice 
to  compute  a  minimum-cost  circulation.  The  maximum  flow  algorithm  of  Dinic  [11],  as 
well  as  the  many  maximum  flow  algorithms  based  on  his  approach,  e.g.  [35,38,46,53],  find 
a  maximum  flow  by  solving  0(n)  blocking  flow  problems.  Thus  our  result  shows  that  the 
time  to  solve  a  minimum-cost  circulation  problem  is  only  0(min{log( nC),  mlogn})  times 
greater  than  the  time  to  find  a  maximum  flow  using  Dinic’s  approach. 

To  describe  our  method  we  need  some  standard  definitions.  Let  G  =  (V,E)  be  a  directed 
graph  with  a  capacity  function  u  and  with  two  distinguished  vertices,  a  source  s  and  a  sink 
t.  A  pseudoflow  on  G  is  a  flow  if  every  vertex  except  s  and  t  has  zero  excess.  Observe  that 
if  /  is  a  flow,  e/(t)  =  —e/(s).  The  value  of  a  flow  /  is  e/(<).  A  flow  is  maximum  if  ej(t) 
is  as  large  as  possible.  A  flow  /  is  blocking  if  any  path  from  s  to  t  in  G  contains  at  least 


one  saturated  arc,  i.e.  an  arc  ( v,w )  such  that  u/(v,t a)  =  0.  A  maximum  flow  is  blocking, 
but  not  conversely.  A  directed  graph  is  acyclic  if  it  contains  no  cycles.  It  is  layered  if  its 
vertices  can  be  assigned  integer  layers  in  such  a  way  that  layer  ( v )  =  layer  ( w )  +  1  for  every 
arc  (v,w).  A  layered  graph  is  acyclic  but  not  conversely. 

The  main  step  of  our  method  is  the  computation  of  a  blocking  flow  on  an  acyclic  network. 
Many  of  the  known  blocking  flow  algorithms  are  stated  for  layered  networks,  but  they  apply 
to  acyclic  networks  as  well  (e.g.  [35,38,53]).  The  blocking  flow  algorithms  of  Galil  [22]  and 
Shiloach  and  Vishkin  [46]  can  be  extended  from  layered  to  acyclic  networks  without  affecting 
their  asymptotic  time  bounds  (see  [26]). 

8.1  The  Blocking  Flow  Method 

We  shall  describe  an  implementation  of  refine  that  reduces  c  by  a  factor  of  two  by  computing 
O(n)  blocking  flows.  This  method,  called  the  blocking  flow  method ,  consists  of  refine  as 
described  in  Section  5  (Figure  4)  with  the  original  loop  replaced  as  described  in  Figure  15. 
The  new  loop  repeatedly  modifies  the  current  pseudoflow  until  it  is  a  circulation.  To  modify 
the  pseudoflow,  the  method  first  partitions  the  vertices  of  G  into  two  sets  S  and  ~§,  such 
that  5  contains  all  vertices  reachable  in  the  current  admissible  graph  Ga  from  vertices  of 
positive  excess.  Vertices  in  S  have  their  prices  increased  by  c.  Next,  an  auxiliary  network  N 
is  constructed  by  adding  to  Ga  a  source  s,  a  sink  t,  an  arc  (s,  v)  of  capacity  e/(u)  for  each 
vertex  v  with  e/(v)  >  0,  and  an  arc  (v,f)  of  capacity  —  ej(v)  for  each  vertex  with  e/(v)  <  0. 
An  arc  («,  w)  6  Ea  has  capacity  uj(v,w)  in  N.  A  blocking  flow  b  on  N  is  found.  Finally, 
the  pseudoflow  /  is  replaced  by  the  pseudoflow  f'(v ,  w )  =  f(v,  w)  +  b(v,  w)  for  (r,  w)  £  E. 
The  correctness  of  the  blocking  flow  method  follows  from  the  next  lemma. 

Lemma  8.1  The  set  S  computed  in  the  inner  loop  contains  only  vertices  v  with  e/(v)  >  0. 
At  the  beginning  of  an  iteration  of  the  loop,  f  is  an  (-optimal  pseudoflow  with  respect  to  the 
price  function  p.  Increasing  the  prices  of  vertices  in  S  preserves  the  e- optimality  of  f .  The 
admissible  graph  remains  acyclic  throughout  the  algorithm. 

Proof:  The  proof  is  by  induction  on  the  number  of  iterations  of  the  inner  loop.  At  the 
beginning  of  an  iteration  of  the  loop,  there  is  no  path  in  Ga  from  a  vertex  v  with  positive 
excess  to  a  vertex  w  of  negative  excess;  for  the  second  and  subsequent  iterations  this  follows 
from  the  addition  to  /  of  a  blocking  flow  during  the  previous  iteration.  Increasing  the 


procedure  refine{(,  f,  p); 

[initialization] 

t  —  </2; 

V(t>,  ti>)  €  E  do  if  Cp( v,  w)  <  0  then  f(v,  w)  «—  u(t>,  u/); 

[loop] 

while  /  is  not  a  circulation  do  begin 

5  «—  {»  €  K|3u  €  V  such  that  ey  >  0  and  v  is  reachable  from  u  in  Ga}\ 
Vv  €  S,  p(v)  —  p(v)  +  c; 

let  S  be  the  network  formed  from  Ga  by  adding  a  source  t,  a  sink  t, 
an  arc  (s,v)  of  capacity  e/(v)  for  each  v  £V  with  e/(v)  >  0,  and 
an  arc  (v,t)  of  capacity  -ey(ti)  for  each  v  £  V  with  ey(v)  <  0; 
find  a  blocking  flow  6  on  N ; 

V(»,  u>)  €  Ea ,  f(v,  w)  —  f(v,  w)  +  fc( v,  w); 

end; 

return  (e,f,p); 

end. 


Figure  15:  The  blocking  refine  subroutine. 


price  of  every  vertex  in  S  by  e  preserves  the  c-optimality  of  /,  since  before  the  increase 
every  residual  arc  (v,w)  with  veS,w(S  has  cp(v,  w)  >  0.  The  only  time  new  arcs  become 
admissible  is  as  a  result  of  such  a  price  increase.  This  increase  can  add  new  admissible  arcs 
from  S  to  y  but  will  make  all  arcs  from  S'  to  S  non-admissible.  Thus  the  admissible  graph 
remains  acyclic.  | 

Remark:  If  we  are  given  a  price  function  p  with  respect  to  which  some  circulation  is  optimal, 
then  we  can  find  an  optimal  circulation  /  by  means  of  a  single  maximum  flow  computation. 
The  method  resembles  the  blocking  flow  approach.  We  begin  by  constructing  the  pseudoflow 
/  that  saturates  all  negative-cost  arcs  and  is  zero  on  all  other  arcs.  Then  we  construct  the 
subgraph  of  the  admissible  graph  containing  all  zero-cost  arcs.  To  this  graph,  we  add  a 
source  s,  a  sink  t,  an  arc  (a,v)  of  capacity  e/(v)  for  each  vertex  v  with  c/(r)  >  0,  and  an 
arc  (v,f)  of  capacity  -e/(r)  for  each  vertex  v  with  e/( v)  <  0.  Finally,  we  find  a  maximum 
flow  f  in  this  network  and  use  /'  to  augment  /.  (The  optimality  of  p  implies  that  f  will 
saturate  all  arcs  leaving  «  and  all  arcs  entering  t.)  | 


8.2  Analysis  of  the  Blocking  Flow  Method 

The  following  lemma  bounds  the  number  of  blocking  flow  computations.  It  is  analogous 
to  the  bound  of  Bland  and  Jensen  (7)  on  the  number  of  maximum  flow  computations  in  a 


scaling  step  of  their  algorithm. 


Lemma  8.2  The  number  of  iterations  of  the  inner  loop  in  the  blocking  flow  implementation 
of  refine  is  at  most  3n. 

Proof:  The  inner  loop  never  changes  a  vertex  excess  from  nonnegative  to  negative  or 
from  nonpositive  to  positive.  Thus  all  vertices  with  negative  excess  never  have  their  price 
changed,  and  all  vertices  of  positive  excess  have  had  their  price  increased  by  ie  after  i  it¬ 
erations  of  the  inner  loop.  The  same  proof  as  that  of  Lemma  5.8  shows  that  no  vertex  of 
positive  excess  can  have  a  price  change  exceeding  3 ne.  Hence  there  are  at  most  3n  iterations. 

I 

Theorem  8.3  The  blocking  flow  implementation  of  refine  runs  in  0(nB(n,  m ))  time,  giving 

an  0(nB(n ,  m)  min{log( nC),  m  log  n})  bound  for  finding  a  minimum-cost  circulation,  where 

B{n,  m)  is  the  time  needed  to  find  a  blocking  flow  in  an  acyclic  network  with  n  vertices  and 
m  arcs. 

Proof :  Immediate  from  Lemma  8.2.  | 

Theorem  8.3  implies  that  the  blocking  flow  approach  yields  implementations  of  refine 
with  running  times  of  0(n3),  0(n6/3m2/3),  and  0(nm log n),  based  on  the  known  bounds  for 
computing  a  blocking  flow  in  an  acyclic  network,  namely  0(n2)  [35,38,46,53],  0(n2/3m2/3) 
[22],  and  0(m  log  n)  [47,48].  The  algorithms  of  Shiloach  and  Vishkin  [46]  and  Galil  [22]  must 
be  extended  to  handle  acyclic  networks  instead  of  layered  networks,  but  such  extension  does 
not  affect  their  asymptotic  running  times. 

In  the  next  section  we  shall  develop  an  0(mlog(n2/m))-time  blocking  flow  algorithm, 
which  in  combination  with  Theorem  8.3  yields  an  0(nmlog(n2/m)min{log(nC),mlogn}) 
time  bound  for  computing  a  minimum-cost  circulation.  This  matches  the  bound  obtained 
in  Section  7. 

The  blocking  flow  approach  also  yields  efficient  parallel  and  distributed  minimum-cost 
circulation  algorithms.  These  use  the  Shiloach- Vishkin  parallel  blocking  flow  algorithm, 
which  runs  in  0(n  log  n)  time  using  n  processors  and  0(nm)  memory  [46].  Vishkin  (private 
communication,  1986)  has  reduced  the  memory  requirement  of  the  algorithm  from  the 
published  bound  of  0(nm)  to  0(n2).  Previous  results  known  to  us  are  as  follows.  The 
Shiloach- Vishkin  algorithm  can  be  used  in  the  cost-scaling  methods  of  Rock  [45]  and  Bland 
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and  Jensen  [7]  to  yield  an  0(n3(logn)log(nC))-time  n- processor  parallel  algorithm.  The 
blocking  flow  approach  improves  this  bound  by  a  factor  of  n.  Bertsekas  [4]  has  proposed 
a  “chaotic”  algorithm  for  the  minimum-cost  circulation  problem  that  converges  in  a  finite 
number  of  steps  in  a  distributed  model. 

Theorem  8.4  The  blocking  flow  approach  yields  a  minimum-cost  circulation  algorithm 
running  in  0(nJ(log n)min{log( nC),  mlogn})  time  using  n  processors  and  0(n2)  memory. 
These  bounds  are  valid  for  either  the  PRAM  computation  model  [14]  without  concurrent 
writing  or  the  DRAM  model  [37]. 

The  same  method  gives  good  bounds  in  distributed  models  of  parallel  computation 
[2,24].  In  the  statement  of  results  below,  A„  denotes  the  degree  of  vertex  v\  each  vertex  v 
has  a  processor  pv  and  communication  links  with  all  adjacent  vertices. 

Theorem  8.5  The  blocking  flow  approach  yields  a  minimum-cost  circulation  algorithm 
for  the  synchronous  distributed  model  running  in  0(n2  min{log( nC),  mlogn})  time  using 
0(n3  min{log( nC),  mlogn})  total  messages  and  0(nAv)  memory  per  processor  p„.  In  the 
asynchronous  distributed  model,  the  time  increases  to  0(n2(logn) min{log(nC7), mlogn}); 
the  bounds  on  message  complexity  and  memory  remain  the  same. 

Proof:  The  bounds  for  the  synchronous  model  follow  from  the  observation  that  the  Shiloach- 
Vishkin  blocking  flow  algorithm  runs  on  the  distributed  model  in  0{n 2)  time  using  0(n3) 
messages  and  0(nAu)  memory  per  processor.  The  bounds  for  the  asynchronous  model  fol¬ 
low  using  the  synchronization  protocol  of  Awerbuch  [2],  which  increases  the  time  bound  by 
a  factor  of  O(logn).  | 

Remark:  A  disadvantage  of  the  Shiloach-Vishkin  algorithm  is  the  greater  than  linear  (i.e. 
n(n2))  memory  requirement.  The  generic  refine  method  has  a  parallel  version  that  pro¬ 
cesses  all  active  vertices  in  parallel  (see  [26]).  It  uses  only  a  linear  amount  of  memory. 
Unfortunately,  the  running  time  bound  for  this  algorithm  is  worse  than  the  running  time 
bound  for  the  Shiloach-Vishkin  algorithm.  It  is  still  possible,  however,  that  a  variation  of 
this  algorithm  runs  in  the  same  time  as  the  Shiloach-Vishkin  algorithm  and  uses  a  linear 
amount  of  memory.  See  [26]  for  a  discussion  of  implications  of  this  conjecture.  Although 
the  theoretical  bound  on  the  running  time  of  the  parallel  version  of  the  generic  method  is 
no  better  than  the  time  bound  of  the  best  sequential  version,  it  is  still  possible  that  the 
parallel  version  will  perform  well  in  practice. 
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8.3  Finding  a  Blocking  Flow 


In  this  section  we  propose  a  new  sequential  algorithm  for  finding  a  blocking  flow  in  an 
acyclic  network.  The  algorithm  combines  ideas  in  the  wave  algorithm  [53]  with  the  data 
structures  discussed  in  Section  7.  It  runs  in  0(mlog(n2/m))  time.  Since  the  new  algorithm 
closely  resembles  the  dynamic  tree  implementation  of  refine ,  we  shall  merely  sketch  the 
ideas  and  omit  the  analysis. 

Let  G  =  (V,  E)  be  an  acyclic  graph  with  source  s,  sink  t,  and  capacity  function  u.  The 
algorithm  maintains  a  preflow  /  on  G ,  initially  such  that  f(s,  v)  =  u(s,  v)  for  v  6  V  and 
/( v,  w)  =  0  for  v,  w  e  V  -  {s}.  Each  vertex  v  is  in  one  of  two  states,  blocked  or  unblocked. 
An  unblocked  vertex  can  become  blocked,  but  not  conversely.  Initially  all  vertices  are 
unblocked.  A  vertex  v  #  {s,  t]  is  active  if  e/(v)  >  0. 

The  algorithm  makes  use  of  the  three  operations  push,  pull,  and  block.  A  push  operation 
applies  to  an  arc  (r,tu)  such  that  t;  is  active,  v  and  w  are  unblocked,  and  uj(v,w)  >  0.  It 
consists  of  increasing  f(v,w)  by  min{e/(t>),  u/(w,tn)}.  A  pull  operation  applies  to  an  arc 
(v,  w)  such  that  w  is  active  and  blocked  and  f(v,  to)  >  0.  It  consists  of  decreasing  f{v,  w) 
by  min{e/(«7),  f(v,w)}.  A  6/ocifc  operation  applies  to  an  unblocked  vertex  v  such  that,  for 
every  arc  (v,w),  either  u/(v,to)  =  0  or  w  is  blocked.  It  consists  of  making  v  blocked. 

The  generic  blocking  flow  algorithm  consists  of  performing  push,  pull  and  block  operations 
in  any  order  until  no  vertex  is  active,  at  which  time  the  current  preflow  is  a  blocking  flow. 
The  proof  of  correctness  resembles  the  proof  of  correctness  of  the  generic  version  of  refine 
presented  in  Section  5.  The  generic  algorithm  can  be  implemented  in  a  straightforward  way 
to  run  in  O(nm)  time.  This  result  is  analogous  to  the  0(n2m)  time  bound  for  the  generic 
version  of  refine  derived  in  Section  6.  In  the  analysis,  vertex  blockings  play  the  same  role 
as  relabelings  do  in  the  analysis  of  refine.  There  are  at  most  n  -  2  vertex  blockings. 

We  improve  the  algorithm  by  introducing  the  discharge  operation.  A  discharge  operation 
applies  to  an  active  vertex  v.  The  operation  applies  a  push,  block,  or  pull  operation  to  v, 
whichever  applies,  and  repeats  this  until  v  becomes  inactive.  The  first-active  blocking  flow 
algorithm,  similar  to  the  first-active  version  of  refine,  uses  a  list  L  to  determine  the  order 
in  which  to  process  active  vertices.  List  L  contains  all  vertices  of  G,  initially  in  topological 
order.  The  algorithm  consists  of  repeating  the  following  three  steps  until  there  are  no  active 
vertices:  Select  the  first  active  vertex  on  L,  say  v.  Apply  a  discharge  operation  to  v.  If  the 
discharge  has  made  v  blocked,  move  v  to  the  front  of  L.  This  algorithm  runs  in  0(n2)  time. 


The  key  observation  is  that  each  vertex  can  be  discharged  at  most  once  between  vertex 
blockings. 

We  improve  the  algorithm  further  by  implementing  it  using  Anger  search  trees  to  rep¬ 
resent  L  and  dynamic  trees  to  represent  the  set  of  arcs  eligible  for  pushing  and  pulling. 
The  details  and  the  analysis  are  similar  to  those  of  the  tree  version  of  refine  presented  in 
Section  7.  The  running  time  of  this  algorithm  is  0(m  log(  n2/m)).  The  factor  of  n  savings 
over  the  time  bound  of  the  tree  version  of  refine  comes  from  the  fact  that  there  are  O(n) 
vertex  blockings  instead  of  0(n2)  relabelings. 

This  algorithm  in  combination  with  Theorem  8.3  gives  the  following  result: 

Theorem  8.6  The  blocking  flow  approach  to  the  minimum  cost  circulation  problem  has  an 
0(nm  log(n2/m)  min{log(nC),  m  log  n})-time  implementation. 

9  Practical  Improvements 

In  this  section  we  discuss  the  practicality  of  our  approach.  Initial  practical  experience  with 
scaling  algorithms  for  the  minimum-cost  flow  problem  was  disappointing.  As  a  result,  such 
algorithms  were  condemned  as  impractical  in  the  literature,  and  simplex-based  algorithms 
continued  to  be  used  in  practice.  This  situation  did  not  change  even  though  new  scaling 
algorithms  were  developed.  Recently,  however,  Bland  and  Jensen  conducted  a  study  to 
compare  their  scaling  algorithm  with  simplex- based  codes  [7].  They  found  their  algorithm 
competitive,  and  concluded  that  the  practicality  of  scaling  algorithms  should  be  investigated 
further.  Their  results  are  especially  promising  since  they  used  the  maximum  flow  algorithm 
of  Malhotra,  Promodh  Kumar,  and  Maheshwari  [38]  as  a  subroutine.  This  algorithm, 
though  simple  to  code,  is  likely  to  be  slower  in  practice  than  other,  newer  maximum  flow 
algorithms,  e.g.  [1,29].  The  results  of  Bland  and  Jensen  also  agree  with  the  results  of 
Bateson  [3],  who  compared  Gabow’s  scaling  algorithm  for  the  assignment  problem  [19]  with 
the  Hungarian  method. 

We  are  extremely  optimistic  that  our  approach  will  yield  highly  practical  algorithms. 
Our  optimism  is  based  on  the  work  of  Bland  and  Jensen  discussed  above,  as  well  as  the  first 
author’s  experience  [26]  with  sequential  and  parallel  implementations  of  our  maximum  flow 
algorithm.  (See  also  [42].)  A  careful  experimental  study  is  needed,  however,  before  a  claim 
of  practicality  can  be  made  with  certainty. 

We  suggest  several  heuristics  that  may  improve  the  practical  performance  of  the  algo- 
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rithm.  We  expect  that  in  practice  the  extra  overhead  needed  to  make  the  algorithm  strongly 
polynomial  is  likely  to  make  the  algorithm  slower,  and  should  be  omitted.  Instead,  we  can 
avoid  calling  refine  until  the  current  error  parameter  is  within  a  factor  of  two  of  minimum. 
That  is,  after  halving  e  but  before  calling  refine,  we  do  a  shortest  path  computation  as 
described  in  Section  3.1  to  discover  whether  the  current  circulation  is  e-optimal.  (If  not, 
the  shortest  path  algorithm  will  find  a  negative-cost  cycle.)  If  so,  we  halve  (  again  and 
repeat.  If  not,  we  call  refine.  With  this  method,  the  bound  on  the  number  of  iterations  is 
still  0(log(nC)),  but  some  of  the  iterations  require  only  a  shortest  path  computation. 

Another  improvement  is  to  delete  arcs  when  their  flows  become  fixed,  as  suggested  by 
the  lemma  of  Tardos  discussed  in  Section  4.2.  In  order  to  delete  arcs  whose  flows  can  be 
fixed,  we  modify  the  capacities  after  every  iteration  of  the  outer  loop  by  subtracting  the 
current  circulation.  This  transforms  the  problem  so  that  the  current  circulation  is  the  zero 
circulation.  We  then  delete  every  arc  (v,w)  whose  reduced  cost  has  absolute  value  at  least 
ne.  This  modified  algorithm  needs  only  O(m)  extra  time  per  iteration  of  the  outer  loop. 

Our  approach  to  the  minimum-cost  flow  problem  allows  a  great  degree  of  flexibility.  For 
example,  the  refine  subroutine  is  used  to  reduce  the  error  parameter  by  a  factor  of  two, 
but  it  can  instead  be  used  to  reduce  the  error  parameter  c  by  any  factor  (at  the  cost  of 
increasing  the  time  bound  by  a  related  factor).  Although  fine-tuning  this  factor  does  not 
improve  the  asymptotic  running  time  of  the  algorithm,  it  will  affect  practical  performance. 
Also,  as  in  the  case  of  the  maximum  flow  algorithm,  alternative  orderings  of  the  update 
operations  push  and  relabel  may  result  in  better  performance. 

Another  potential  way  of  improving  the  performance  of  the  generic  refine  subroutine 
would  be  to  use  shortest  path  computations  during  the  execution  of  the  subroutine  to 
update  the  price  function.  This  improvement  would  be  similar  to  using  breadth-first  search 
to  update  distance  labels  in  our  maximum  flow  algorithm  [29].  One  would  have  to  be 
careful,  however,  to  assure  that  these  updates  of  the  price  function  preserve  the  acyclicity 
of  the  admissible  graph. 

10  Remarks 

In  this  section  we  discuss  possible  theoretical  improvements  and  extensions  of  our  results. 
The  most  obvious  open  question  is  whether  the  refine  procedure  can  be  modified  so  that 
it  runs  faster.  The  analogy  with  our  maximum  flow  algorithm  suggests  the  possibility 
of  obtaining  an  0(n2  log  U  +  nm)  bound  for  a  suitable  version  of  refine,  based  on  the 


0(n2  log  U  +  nm)-time  maximum  flow  algorithm  of  Ahuja  and  Orlin  [1].  Unfortunately,  the 
example  in  the  appendix  suggests  that  some  new  idea  will  be  needed  if  this  bound  is  to  be 
obtained.  Another  question  is  whether  the  strongly  polynomial  framework  of  Section  4.2 
can  be  improved  to  reduce  the  number  of  iterations  of  refine  by  a  factor  of  mfn.  Such  an 
improvement  would  make  our  approach  competitive  with  the  algorithm  of  Galil  and  Tardos 
on  both  sparse  and  dense  graphs. 

Although  all  the  versions  of  refine  that  we  have  considered  here  reduce  the  error  param¬ 
eter  by  a  constant  factor,  one  can  consider  other  implementations  of  refine  that  reduce  the 
error  parameter  by  a  factor  that  depends  on  the  parameters  of  the  input  problem.  In  our 
recent  paper  [28],  we  describe  an  implementation  of  refine  that  reduces  the  error  parame¬ 
ter  by  a  factor  of  (n  —  l)/n.  The  resulting  running  time  bounds  are  competitive  with  the 
bounds  of  this  paper. 

Our  general  approach  is  to  scale  costs.  It  would  be  interesting  to  see  if  there  is  a  similar 
method  that  scales  capacities,  or  that  scales  costs  and  capacities  simultaneously.  Such  a 
double  scaling  approach  might  lead  to  improved  complexity  bounds. 

Both  the  maximum  flow  problem  and  the  shortest  path  problem  are  special  cases  of 
the  minimum-cost  flow  problem.  Our  result  gives  a  bound  for  the  minimum-cost  flow 
problem  only  a  logarithmic  factor  larger  than  the  sum  of  the  best  known  bounds  for  these 
subproblems.  A  nice  extension  would  be  a  reduction  of  the  minimum-cost  flow  problem  to 
the  solution  of  a  logarithmic  number  of  maximum  flow  and  shortest  path  problems. 

A  more  general  question  is  whether  the  scaling  approach  can  be  used  to  efficiently  solve 
other  network  optimization  problems.  Natural  candidates  include  problems  involving  flows 
with  gains  and  multicommodity  flow  problems. 

If  all  edges  have  unit  capacities,  all  pushes  performed  by  the  generic  algorithm  are  sat¬ 
urating,  and  the  generic  minimum-cost  circulation  algorithm  runs  in  0(nm  log( nC))  time. 
Using  the  blocking  flow  approach  presented  in  Section  8,  Harold  Gabow  and  the  second  au¬ 
thor  [20,21]  have  obtained  improved  algorithms  for  solving  a  minimum-cost  flow  problem  on 
a  network  with  small  capacities.  Such  a  problem  can  be  solved  in 
0(n2/3t/1/3n»log(n2/m)log(nC'))  time,  assuming  that  the  graph  contains  no  multiple  arcs. 
If  U  =  0(1),  the  bound  becomes  C^minlm1/2,?!2/3}??!  log( nC)).  The  approach  also  gives  an 
algorithm  for  the  assignment  problem  (bipartite  weighted  matching)  running  in 
0(nll*m\of^nC))  time.  The  method  extends  to  the  nonbipartite  weighted  matching  prob¬ 
lem,  giving  a  bound  of  0((n«(m,n)iogn),/2»*log(n6')),  where  a(m,n)  is  a  functional  in- 


verse  of  Ackermann’s  function.  In  this  case  the  algorithm  becomes  much  more  complicated, 
because  sophisticated  data  structures  are  needed  to  implement  it  efficiently.  The  best 
previously  bound  is  0(n3/4m  log  C)  for  both  bipartite  and  nonbipartite  weighted  match¬ 
ing  [18,19].  The  new  bounds  for  weighted  matching  are  close  to  the  best  known  bound, 
0(n1/2m),  for  maximum  cardinality  matching  [31,40].  The  new  nonbipartite  weighted 
matching  algorithm  also  specializes  in  the  case  of  maximum  cardinality  matching  to  yield 
an  0(n1/2m)-time  algorithm  that  is  somewhat  simpler  than  that  of  Micali  and  Vazirani 
[40]. 

Appendix.  A  Hard  Network  for  the  First-In,  First-Out  Al¬ 
gorithm 

The  first-in,  first-out  (FIFO)  implementation  of  refine ,  described  in  Section  6.2,  can  make 
Q(n3)  passes  over  its  queue.  A  network  on  which  this  can  happen  is  shown  in  Figure  16. 
The  network  consists  of  two  paths,  xn,  arn-i,  • . . ,  x,  and  y,,  y2,  •  •  •  ,  yn,  and  three  extra  arcs, 
(si.yi)*  (yn,xn),  and  (x„,xj).  The  arcs  have  the  following  residual  capacities  and  costs, 
where  h  is  a  sufficiently  large  integer  (h  —  n2  will  do). 

u/(ii,x<_i)  =  h,  c(xi,x<_i)  =  1  for  2  <  i  <  n 

«/(y«>  y«+i)  =  h,  c(y,-,  yf+1)  =  -2  for  1  <  i  <  n  -  1 

«/(*i,yi)  =  h-  1,  c(x1,y1)  =  0 
u/(yn,*n)  =  h,  c(yn,xn)  =  -  2 
**/(*n»*l)  —  c(xn,Xj)  =  1 

All  reverse  arcs  (e.g.  (x,,  x,+i))  have  a  residual  capacity  of  zero. 

Suppose  refine  is  given  this  network  with  the  zero  flow,  the  zero  price  function,  and 
e  =  1.  First,  each  arc  (y,-,y,+i)  is  saturated,  as  is  the  arc  (yn>in)-  This  results  in  an  excess 
of  h  at  vertex  xn  and  an  excess  of  -h  at  vertex  y\.  Next  (with  unfavorable  tiebreaking)  the 
excess  of  h  moves  from  xn  along  the  path  to  xj,  in  the  process  increasing  the  potential  of 
each  of  the  vertices  X2,X3, . .  .,x„  to  two.  After  the  potential  of  x,  is  raised  to  one,  h  —  1 
of  the  excess  flows  to  y,,  where  it  cancels  all  but  one  unit  of  negative  excess.  The  potential 
of  xj  then  is  increased  to  two.  The  current  state  is  shown  in  Figure  17;  the  only  remaining 
positive  excess,  of  one  unit,  is  on  x,. 

Now  the  unit  of  excess  moves  repeatedly  around  the  cycle  xi,x2, . .  .,x„,xi.  The  first 
time  this  happens,  the  price  of  xn  goes  up  by  two.  The  second  time,  the  price  of  xn_i 
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goes  up  by  two.  After  n  iterations,  all  of  Z\,  x?, . . .  ,xn  have  increased  by  two  in  price. 
Then  the  process  repeats.  The  unit  of  excess  will  take  occasional  excursions  along  the  path 
xn,  Vni  J/n-i  i  • . .,  jfi,  but  it  will  return  to  xn  and  continue  going  around  the  cycle.  Finally, 
after  f l(n2)  traversals  of  the  cycle,  the  unit  of  excess  will  succeed  in  traversing  the  entire 
path  xn,yn,  yn~i, . . . ,  jq,  and  refine  will  terminate.  Since  there  are  n  passes  over  the  queue 
of  the  FIFO  algorithm  for  each  traversal  of  the  cycle,  the  total  number  of  FIFO  passes  is 
ft(n3). 
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